-
TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models
Authors:
Mihai Nadas,
Laura Diosan,
Andrei Piscoran,
Andreea Tomescu
Abstract:
Moral stories are a time-tested vehicle for transmitting values, yet modern NLP lacks a large, structured corpus that couples coherent narratives with explicit ethical lessons. We close this gap with TF1-EN-3M, the first open dataset of three million English-language fables generated exclusively by instruction-tuned models no larger than 8B parameters. Each story follows a six-slot scaffold (chara…
▽ More
Moral stories are a time-tested vehicle for transmitting values, yet modern NLP lacks a large, structured corpus that couples coherent narratives with explicit ethical lessons. We close this gap with TF1-EN-3M, the first open dataset of three million English-language fables generated exclusively by instruction-tuned models no larger than 8B parameters. Each story follows a six-slot scaffold (character -> trait -> setting -> conflict -> resolution -> moral), produced through a combinatorial prompt engine that guarantees genre fidelity while covering a broad thematic space.
A hybrid evaluation pipeline blends (i) a GPT-based critic that scores grammar, creativity, moral clarity, and template adherence with (ii) reference-free diversity and readability metrics. Among ten open-weight candidates, an 8B-parameter Llama-3 variant delivers the best quality-speed trade-off, producing high-scoring fables on a single consumer GPU (<24 GB VRAM) at approximately 13.5 cents per 1,000 fables.
We release the dataset, generation code, evaluation scripts, and full metadata under a permissive license, enabling exact reproducibility and cost benchmarking. TF1-EN-3M opens avenues for research in instruction following, narrative intelligence, value alignment, and child-friendly educational AI, demonstrating that large-scale moral storytelling no longer requires proprietary giant models.
△ Less
Submitted 29 April, 2025;
originally announced April 2025.
-
Synthetic Data Generation Using Large Language Models: Advances in Text and Code
Authors:
Mihai Nadas,
Laura Diosan,
Andreea Tomescu
Abstract:
Large language models (LLMs) have unlocked new possibilities for generating synthetic training data in both natural language and code. By producing artificial but task-relevant examples, these models can significantly augment or even replace real-world datasets, especially when labeled data is scarce or sensitive. This paper surveys recent advances in using LLMs to create synthetic text and code,…
▽ More
Large language models (LLMs) have unlocked new possibilities for generating synthetic training data in both natural language and code. By producing artificial but task-relevant examples, these models can significantly augment or even replace real-world datasets, especially when labeled data is scarce or sensitive. This paper surveys recent advances in using LLMs to create synthetic text and code, emphasizing prompt-based generation, retrieval-augmented pipelines, and iterative self-refinement. We show how these methods enrich low-resource tasks such as classification and question answering, as well as code-centric applications such as instruction tuning, code translation, and bug repair, by enabling automated verification of functional correctness. Alongside potential benefits like cost-effectiveness, broad coverage, and controllable diversity, we address challenges such as factual inaccuracies in generated text, lack of stylistic realism, and the risk of bias amplification. Proposed mitigations include filtering and weighting outputs and reinforcement learning with execution feedback for code. We conclude with open research directions like automated prompt engineering, cross-modal data synthesis, and robust evaluation frameworks, highlighting the importance of LLM-generated synthetic data in advancing AI while emphasizing ethical and quality safeguards.
△ Less
Submitted 18 March, 2025;
originally announced March 2025.
-
Maximum Coverage $k$-Antichains and Chains: A Greedy Approach
Authors:
Manuel Cáceres,
Andreas Grigorjew,
Wanchote Po Jiamjitrak,
Alexandru I. Tomescu
Abstract:
Given an input acyclic digraph $G = (V,E)$ and a positive integer $k$, the problem of Maximum Coverage $k$-Antichains (resp., Chains) denoted as MA-$k$ (resp., MC-$k$) asks to find $k$ sets of pairwise unreachable vertices, known as antichains (resp., $k$ subsequences of paths, known as chains), maximizing the number of vertices covered by these antichains (resp. chains). While MC-$k$ has been rec…
▽ More
Given an input acyclic digraph $G = (V,E)$ and a positive integer $k$, the problem of Maximum Coverage $k$-Antichains (resp., Chains) denoted as MA-$k$ (resp., MC-$k$) asks to find $k$ sets of pairwise unreachable vertices, known as antichains (resp., $k$ subsequences of paths, known as chains), maximizing the number of vertices covered by these antichains (resp. chains). While MC-$k$ has been recently solved in (almost) optimal $O(|E|^{1+o(1)})$ time [Kogan and Parter, ICALP 2022], the fastest known algorithm for MA-$k$ is a recent $(k|E|)^{1+o(1)}$-time solution [Kogan and Parter, ESA 2024] as well as a $1/2$ approximation running in $|E|^{1+o(1)}$ time in the same paper. In this paper, we leverage a paths-based proof of the Greene-Kleitmann (GK) theorem with the help of the greedy algorithm for set cover and recent advances on fast algorithms for flows and shortest paths to obtain the following results for MA-$k$:
- The first (exact) algorithm running in $|E|^{1+o(1)}$ time, hence independent in $k$.
- A randomized algorithm running in $\tilde{O}(α_k|E|)$ time, where $α_k$ is the size of the optimal solution. That is, a near-linear parameterized running time, generalizing the result of [Mäkinen et al., ACM TALG] obtained for $k=1$.
- An approximation algorithm running in time $O(α_1^2|V| + (α_1+k)|E|)$ with approximation ratio of $(1-1/e) > 0.63 > 1/2$.
Our last two solutions rely on the use of greedy set cover, first exploited in [Felsner et al., Order 2003] for chains, which we now apply to antichains. We complement these results with two examples (one for chains and one for antichains) showing that, for every $k \ge 2$, greedy misses a $1/4$ portion of the optimal coverage. We also show that greedy is a $Ω(\log{|V|})$ factor away from minimality when required to cover all vertices: previously unknown for sets of chains or antichains.
△ Less
Submitted 10 February, 2025;
originally announced February 2025.
-
Safe Paths and Sequences for Scalable ILPs in RNA Transcript Assembly Problems
Authors:
Francisco Sena,
Alexandru I. Tomescu
Abstract:
A common step at the core of many RNA transcript assembly tools is to find a set of weighted paths that best explain the weights of a DAG. While such problems easily become NP-hard, scalable solvers exist only for a basic error-free version of this problem, namely minimally decomposing a network flow into weighted paths.
The main result of this paper is to show that we can achieve speedups of tw…
▽ More
A common step at the core of many RNA transcript assembly tools is to find a set of weighted paths that best explain the weights of a DAG. While such problems easily become NP-hard, scalable solvers exist only for a basic error-free version of this problem, namely minimally decomposing a network flow into weighted paths.
The main result of this paper is to show that we can achieve speedups of two orders of magnitude also for path-finding problems in the realistic setting (i.e., the weights do not induce a flow). We obtain these by employing the safety information that is encoded in the graph structure inside Integer Linear Programming (ILP) solvers for these problems. We first characterize the paths that appear in all path covers of the DAG, generalizing a graph reduction commonly used in the error-free setting (e.g. by Kloster et al. [ALENEX~2018]). Secondly, following the work of Ma, Zheng and Kingsford [RECOMB 2021], we characterize the \emph{sequences} of arcs that appear in all path covers of the DAG.
We experiment with a path-finding ILP model (least squares) and with a more recent and accurate one. We use a variety of datasets originally created by Shao and Kingsford [TCBB, 2017], as well as graphs built from sequencing reads by the state-of-the-art tool for long-read transcript discovery, IsoQuant [Prjibelski et al., Nat.~Biotechnology~2023]. The ILPs armed with safe paths or sequences exhibit significant speed-ups over the original ones. On graphs with a large width, average speed-ups are in the range $50-160\times$ in the latter ILP model and in the range $100-1000\times$ in the least squares model.
Our scaling techniques apply to any ILP whose solution paths are a path cover of the arcs of the DAG. As such, they can become a scalable building block of practical RNA transcript assembly tools, avoiding heuristic trade-offs currently needed on complex graphs.
△ Less
Submitted 21 December, 2024; v1 submitted 6 November, 2024;
originally announced November 2024.
-
Parameterised Approximation and Complexity of Minimum Flow Decompositions
Authors:
Andreas Grigorjew,
Wanchote Jiamjitrak,
Brendan Mumey,
Alexandru I. Tomescu
Abstract:
Minimum flow decomposition (MFD) is the strongly NP-hard problem of finding a smallest set of integer weighted paths in a graph $G$ whose weighted sum is equal to a given flow $f$ on $G$. Despite its many practical applications, we lack an understanding of graph structures that make MFD easy or hard. In particular, it is not known whether a good approximation algorithm exists when the weights are…
▽ More
Minimum flow decomposition (MFD) is the strongly NP-hard problem of finding a smallest set of integer weighted paths in a graph $G$ whose weighted sum is equal to a given flow $f$ on $G$. Despite its many practical applications, we lack an understanding of graph structures that make MFD easy or hard. In particular, it is not known whether a good approximation algorithm exists when the weights are positive.
On the positive side, the main result of this paper is that MFD can be approximated within a factor $O(\log\Vert f\Vert)$ (where $\Vert f\Vert$ is the largest flow weight of all edges) times the ratio between the parallel-width of $G$ (introduced by Deligkas and Meir, MFCS 2018) and the width of $G$ (minimum number of paths to cover all edges). In particular, when the MFD size is at least the parallel-width of $G$, this becomes the first parameterised $O(\log\Vert f\Vert)$-factor approximation algorithm for MFD over positive integers. We also show that there exist instances where the ratio between the parallel-width of $G$ and the MFD size is arbitrarily large, thus narrowing down the class of graphs whose approximation is still open. We achieve these results by introducing a new notion of flow-width of $(G,f)$, which unifies both the width and the parallel-width and may be of independent interest.
On the negative side, we show that small-width graphs do not make MFD easy. This question was previously open, because width-1 graphs (i.e. paths) are trivially solvable, and the existing NP-hardness proofs use graphs of unbounded width. We close this problem by showing the tight results that MFD remains strongly NP-hard on graphs of width 3, and NP-hard on graphs of width 2 (and thus also parallel-width 2). Moreover, on width-2 graphs (and more generally, on constant parallel-width graphs), MFD is solvable in quasi-polynomial time on unary-coded flows.
△ Less
Submitted 30 September, 2024;
originally announced September 2024.
-
Minimum Path Cover: The Power of Parameterization
Authors:
Manuel Cáceres,
Brendan Mumey,
Santeri Toivonen,
Alexandru I. Tomescu
Abstract:
Computing a minimum path cover (MPC) of a directed acyclic graph (DAG) is a fundamental problem with a myriad of applications, including reachability. Although it is known how to solve the problem by a simple reduction to minimum flow, recent theoretical advances exploit this idea to obtain algorithms parameterized by the number of paths of an MPC, known as the width. These results obtain fast [Mä…
▽ More
Computing a minimum path cover (MPC) of a directed acyclic graph (DAG) is a fundamental problem with a myriad of applications, including reachability. Although it is known how to solve the problem by a simple reduction to minimum flow, recent theoretical advances exploit this idea to obtain algorithms parameterized by the number of paths of an MPC, known as the width. These results obtain fast [Mäkinen et al., TALG] and even linear time [Cáceres et al., SODA 2022] algorithms in the small-width regime.
In this paper, we present the first publicly available high-performance implementation of state-of-the-art MPC algorithms, including the parameterized approaches. Our experiments on random DAGs show that parameterized algorithms are orders-of-magnitude faster on dense graphs. Additionally, we present new pre-processing heuristics based on transitive edge sparsification. We show that our heuristics improve MPC-solvers by orders-of-magnitude.
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
A Safety Framework for Flow Decomposition Problems via Integer Linear Programming
Authors:
Fernando H. C. Dias,
Manuel Caceres,
Lucia Williams,
Brendan Mumey,
Alexandru I. Tomescu
Abstract:
Many important problems in Bioinformatics (e.g., assembly or multi-assembly) admit multiple solutions, while the final objective is to report only one. A common approach to deal with this uncertainty is finding safe partial solutions (e.g., contigs) which are common to all solutions. Previous research on safety has focused on polynomially-time solvable problems, whereas many successful and natural…
▽ More
Many important problems in Bioinformatics (e.g., assembly or multi-assembly) admit multiple solutions, while the final objective is to report only one. A common approach to deal with this uncertainty is finding safe partial solutions (e.g., contigs) which are common to all solutions. Previous research on safety has focused on polynomially-time solvable problems, whereas many successful and natural models are NP-hard to solve, leaving a lack of "safety tools" for such problems. We propose the first method for computing all safe solutions for an NP-hard problem, minimum flow decomposition. We obtain our results by developing a "safety test" for paths based on a general Integer Linear Programming (ILP) formulation. Moreover, we provide implementations with practical optimizations aimed to reduce the total ILP time, the most efficient of these being based on a recursive group-testing procedure.
Results: Experimental results on the transcriptome datasets of Shao and Kingsford (TCBB, 2017) show that all safe paths for minimum flow decompositions correctly recover up to 90% of the full RNA transcripts, which is at least 25% more than previously known safe paths, such as (Caceres et al. TCBB, 2021), (Zheng et al., RECOMB 2021), (Khan et al., RECOMB 2022, ESA 2022). Moreover, despite the NP-hardness of the problem, we can report all safe paths for 99.8% of the over 27,000 non-trivial graphs of this dataset in only 1.5 hours. Our results suggest that, on perfect data, there is less ambiguity than thought in the notoriously hard RNA assembly problem.
Availability: https://github.com/algbio/mfd-safety
△ Less
Submitted 30 January, 2023;
originally announced January 2023.
-
Minimum Path Cover in Parameterized Linear Time
Authors:
Manuel Caceres,
Massimo Cairo,
Brendan Mumey,
Romeo Rizzi,
Alexandru I. Tomescu
Abstract:
A minimum path cover (MPC) of a directed acyclic graph (DAG) $G = (V,E)$ is a minimum-size set of paths that together cover all the vertices of the DAG. Computing an MPC is a basic polynomial problem, dating back to Dilworth's and Fulkerson's results in the 1950s. Since the size $k$ of an MPC (also known as the width) can be small in practical applications, research has also studied algorithms who…
▽ More
A minimum path cover (MPC) of a directed acyclic graph (DAG) $G = (V,E)$ is a minimum-size set of paths that together cover all the vertices of the DAG. Computing an MPC is a basic polynomial problem, dating back to Dilworth's and Fulkerson's results in the 1950s. Since the size $k$ of an MPC (also known as the width) can be small in practical applications, research has also studied algorithms whose running time is parameterized on $k$.
We obtain a new MPC parameterized algorithm for DAGs running in time $O(k^2|V| + |E|)$. Our algorithm is the first solving the problem in parameterized linear time. Additionally, we obtain an edge sparsification algorithm preserving the width of a DAG but reducing $|E|$ to less than $2|V|$. This algorithm runs in time $O(k^2|V|)$ and requires an MPC of a DAG as input, thus its total running time is the same as the running time of our MPC algorithm.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
Cut paths and their remainder structure, with applications
Authors:
Massimo Cairo,
Shahbaz Khan,
Romeo Rizzi,
Sebastian Schmidt,
Alexandru I. Tomescu,
Elia C. Zirondelli
Abstract:
In a strongly connected graph $G = (V,E)$, a cut arc (also called strong bridge) is an arc $e \in E$ whose removal makes the graph no longer strongly connected. Equivalently, there exist $u,v \in V$, such that all $u$-$v$ walks contain $e$. Cut arcs are a fundamental graph-theoretic notion, with countless applications, especially in reachability problems.
In this paper we initiate the study of c…
▽ More
In a strongly connected graph $G = (V,E)$, a cut arc (also called strong bridge) is an arc $e \in E$ whose removal makes the graph no longer strongly connected. Equivalently, there exist $u,v \in V$, such that all $u$-$v$ walks contain $e$. Cut arcs are a fundamental graph-theoretic notion, with countless applications, especially in reachability problems.
In this paper we initiate the study of cut paths, as a generalisation of cut arcs, which we naturally define as those paths $P$ for which there exist $u,v \in V$, such that all $u$-$v$ walks contain $P$ as subwalk. We first prove various properties of cut paths and define their remainder structures, which we use to present a simple $O(m)$-time verification algorithm for a cut path ($|V| = n$, $|E| = m$).
Secondly, we apply cut paths and their remainder structures to improve several reachability problems from bioinformatics. A walk is called safe if it is a subwalk of every node-covering closed walk of a strongly connected graph. Multi-safety is defined analogously, by considering node-covering sets of closed walks instead. We show that cut paths provide simple $O(m)$-time algorithms verifying if a walk is safe or multi-safe. For multi-safety, we present the first linear time algorithm, while for safety, we present a simple algorithm where the state-of-the-art employed complex data structures. Finally we show that the simultaneous computation of remainder structures of all subwalks of a cut path can be performed in linear time. These properties yield an $O(mn)$ algorithm outputting all maximal multi-safe walks, improving over the state-of-the-art algorithm running in time $O(m^2+n^3)$.
The results of this paper only scratch the surface in the study of cut paths, and we believe a rich structure of a graph can be revealed, considering the perspective of a path, instead of just an arc.
△ Less
Submitted 14 October, 2022;
originally announced October 2022.
-
Minimum Flow Decomposition in Graphs with Cycles using Integer Linear Programming
Authors:
Fernando H. C. Dias,
Lucia Williams,
Brendan Mumey,
Alexandru I. Tomescu
Abstract:
Minimum flow decomposition (MFD) -- the problem of finding a minimum set of weighted source-to-sink paths that perfectly decomposes a flow -- is a classical problem in Computer Science, and variants of it are powerful models in different fields such as Bioinformatics and Transportation. Even on acyclic graphs, the problem is NP-hard, and most practical solutions have been via heuristics or approxi…
▽ More
Minimum flow decomposition (MFD) -- the problem of finding a minimum set of weighted source-to-sink paths that perfectly decomposes a flow -- is a classical problem in Computer Science, and variants of it are powerful models in different fields such as Bioinformatics and Transportation. Even on acyclic graphs, the problem is NP-hard, and most practical solutions have been via heuristics or approximations. While there is an extensive body of research on acyclic graphs, currently, there is no \emph{exact} solution on graphs with cycles. In this paper, we present the first ILP formulation for three natural variants of the MFD problem in graphs with cycles, asking for a decomposition consisting only of weighted source-to-sink paths or cycles, trails, and walks, respectively. On three datasets of increasing levels of complexity from both Bioinformatics and Transportation, our approaches solve any instance in under 10 minutes. Our implementations are freely available at github.com/algbio/MFD-ILP.
△ Less
Submitted 16 January, 2023; v1 submitted 31 August, 2022;
originally announced September 2022.
-
Simplicity in Eulerian Circuits: Uniqueness and Safety
Authors:
Nidia Obscura Acosta,
Alexandru I. Tomescu
Abstract:
An Eulerian circuit in a directed graph is one of the most fundamental Graph Theory notions. Detecting if a graph $G$ has a unique Eulerian circuit can be done in polynomial time via the BEST theorem by de Bruijn, van Aardenne-Ehrenfest, Smith and Tutte, 1941-1951 (involving counting arborescences), or via a tailored characterization by Pevzner, 1989 (involving computing the intersection graph of…
▽ More
An Eulerian circuit in a directed graph is one of the most fundamental Graph Theory notions. Detecting if a graph $G$ has a unique Eulerian circuit can be done in polynomial time via the BEST theorem by de Bruijn, van Aardenne-Ehrenfest, Smith and Tutte, 1941-1951 (involving counting arborescences), or via a tailored characterization by Pevzner, 1989 (involving computing the intersection graph of simple cycles of $G$), both of which thus rely on overly complex notions for the simpler uniqueness problem.
In this paper we give a new linear-time checkable characterization of directed graphs with a unique Eulerian circuit. This is based on a simple condition of when two edges must appear consecutively in all Eulerian circuits, in terms of cut nodes of the underlying undirected graph of $G$. As a by-product, we can also compute in linear-time all maximal $\textit{safe}$ walks appearing in all Eulerian circuits, for which Nagarajan and Pop proposed in 2009 a polynomial-time algorithm based on Pevzner characterization.
△ Less
Submitted 25 May, 2023; v1 submitted 17 August, 2022;
originally announced August 2022.
-
Width Helps and Hinders Splitting Flows
Authors:
Manuel Cáceres,
Massimo Cairo,
Andreas Grigorjew,
Shahbaz Khan,
Brendan Mumey,
Romeo Rizzi,
Alexandru I. Tomescu,
Lucia Williams
Abstract:
Minimum flow decomposition (MFD) is the NP-hard problem of finding a smallest decomposition of a network flow/circulation $X$ on a directed graph $G$ into weighted source-to-sink paths whose superposition equals $X$. We show that, for acyclic graphs, considering the \emph{width} of the graph (the minimum number of paths needed to cover all of its edges) yields advances in our understanding of its…
▽ More
Minimum flow decomposition (MFD) is the NP-hard problem of finding a smallest decomposition of a network flow/circulation $X$ on a directed graph $G$ into weighted source-to-sink paths whose superposition equals $X$. We show that, for acyclic graphs, considering the \emph{width} of the graph (the minimum number of paths needed to cover all of its edges) yields advances in our understanding of its approximability. For the version of the problem that uses only non-negative weights, we identify and characterise a new class of \emph{width-stable} graphs, for which a popular heuristic is a \gwsimple-approximation ($|X|$ being the total flow of $X$), and strengthen its worst-case approximation ratio from $Ω(\sqrt{m})$ to $Ω(m / \log m)$ for sparse graphs, where $m$ is the number of edges in the graph. We also study a new problem on graphs with cycles, Minimum Cost Circulation Decomposition (MCCD), and show that it generalises MFD through a simple reduction. For the version allowing also negative weights, we give a $(\lceil \log \Vert X \Vert \rceil +1)$-approximation ($\Vert X \Vert$ being the maximum absolute value of $X$ on any edge) using a power-of-two approach, combined with parity fixing arguments and a decomposition of unitary circulations ($\Vert X \Vert \leq 1$), using a generalised notion of width for this problem. Finally, we disprove a conjecture about the linear independence of minimum (non-negative) flow decompositions posed by Kloster et al. [ALENEX 2018], but show that its useful implication (polynomial-time assignments of weights to a given set of paths to decompose a flow) holds for the negative version.
△ Less
Submitted 9 May, 2023; v1 submitted 5 July, 2022;
originally announced July 2022.
-
Safety and Completeness in Flow Decompositions for RNA Assembly
Authors:
Shahbaz Khan,
Milla Kortelainen,
Manuel Cáceres,
Lucia Williams,
Alexandru I. Tomescu
Abstract:
Decomposing a network flow into weighted paths has numerous applications. Some applications require any decomposition that is optimal w.r.t. some property such as number of paths, robustness, or length. Many bioinformatic applications require a specific decomposition where the paths correspond to some underlying data that generated the flow. For real inputs, no optimization criteria guarantees to…
▽ More
Decomposing a network flow into weighted paths has numerous applications. Some applications require any decomposition that is optimal w.r.t. some property such as number of paths, robustness, or length. Many bioinformatic applications require a specific decomposition where the paths correspond to some underlying data that generated the flow. For real inputs, no optimization criteria guarantees to uniquely identify the correct decomposition. Therefore, we propose to report safe paths, i.e., subpaths of at least one path in every flow decomposition.
Ma, Zheng, and Kingsford [WABI 2020] addressed the existence of multiple optimal solutions in a probabilistic framework, i.e., non-identifiability. Later [RECOMB 2021], they gave a quadratic-time algorithm based on a global criterion for solving a problem called AND-Quant, which generalizes the problem of reporting whether a given path is safe.
We give the first local characterization of safe paths for flow decompositions in directed acyclic graphs (DAGs), leading to a practical algorithm for finding the complete set of safe paths. We evaluated our algorithms against the trivial safe algorithms (unitigs, extended unitigs) and the popularly used heuristic (greedy-width) for flow decomposition on RNA transcripts datasets. Despite maintaining perfect precision our algorithm reports significantly higher coverage ($\approx 50\%$ more) than trivial safe algorithms. The greedy-width algorithm though reporting a better coverage, has significantly lower precision on complex graphs. Overall, our algorithm outperforms (by $\approx 20\%$) greedy-width on a unified metric (F-Score) when the dataset has significant number of complex graphs. Moreover, it has superior time ($3-5\times$) and space efficiency ($1.2-2.2\times$), resulting in a better and more practical approach for bioinformatics applications of flow decomposition.
△ Less
Submitted 25 January, 2022;
originally announced January 2022.
-
The Labeled Direct Product Optimally Solves String Problems on Graphs
Authors:
Nicola Rizzo,
Alexandru I. Tomescu,
Alberto Policriti
Abstract:
Suffix trees are an important data structure at the core of optimal solutions to many fundamental string problems, such as exact pattern matching, longest common substring, matching statistics, and longest repeated substring. Recent lines of research focused on extending some of these problems to vertex-labeled graphs, although using ad-hoc approaches which in some cases do not generalize to all i…
▽ More
Suffix trees are an important data structure at the core of optimal solutions to many fundamental string problems, such as exact pattern matching, longest common substring, matching statistics, and longest repeated substring. Recent lines of research focused on extending some of these problems to vertex-labeled graphs, although using ad-hoc approaches which in some cases do not generalize to all input graphs. In the absence of a ubiquitous tool like the suffix tree for labeled graphs, we introduce the labeled direct product of two graphs as a general tool for obtaining optimal algorithms: we obtain conceptually simpler algorithms for the quadratic problems of string matching (SMLG) and longest common substring (LCSP) in labeled graphs. Our algorithms are also more efficient, since they run in time linear in the size of the labeled product graph, which may be smaller than quadratic for some inputs, and their run-time is predictable, because the size of the labeled direct product graph can be precomputed efficiently. We also solve LCSP on graphs containing cycles, which was left as an open problem by Shimohira et al. in 2011. To show the power of the labeled product graph, we also apply it to solve the matching statistics (MSP) and the longest repeated string (LRSP) problems in labeled graphs. Moreover, we show that our (worst-case quadratic) algorithms are also optimal, conditioned on the Orthogonal Vectors Hypothesis. Finally, we complete the complexity picture around LRSP by studying it on undirected graphs.
△ Less
Submitted 11 September, 2021;
originally announced September 2021.
-
Sparsifying, Shrinking and Splicing for Minimum Path Cover in Parameterized Linear Time
Authors:
Manuel Cáceres,
Massimo Cairo,
Brendan Mumey,
Romeo Rizzi,
Alexandru I. Tomescu
Abstract:
A minimum path cover (MPC) of a directed acyclic graph (DAG) $G = (V,E)$ is a minimum-size set of paths that together cover all the vertices of the DAG. Computing an MPC is a basic polynomial problem, dating back to Dilworth's and Fulkerson's results in the 1950s. Since the size $k$ of an MPC (also known as the width) can be small in practical applications, research has also studied algorithms who…
▽ More
A minimum path cover (MPC) of a directed acyclic graph (DAG) $G = (V,E)$ is a minimum-size set of paths that together cover all the vertices of the DAG. Computing an MPC is a basic polynomial problem, dating back to Dilworth's and Fulkerson's results in the 1950s. Since the size $k$ of an MPC (also known as the width) can be small in practical applications, research has also studied algorithms whose complexity is parameterized on $k$. We obtain two new MPC parameterized algorithms for DAGs running in time $O(k^2|V|\log{|V|} + |E|)$ and $O(k^3|V| + |E|)$. We also obtain a parallel algorithm running in $O(k^2|V| + |E|)$ parallel steps and using $O(\log{|V|})$ processors (in the PRAM model). Our latter two algorithms are the first solving the problem in parameterized linear time. Finally, we present an algorithm running in time $O(k^2|V|)$ for transforming any MPC to another MPC using less than $2|V|$ distinct edges, which we prove to be asymptotically tight. As such, we also obtain edge sparsification algorithms preserving the width of the DAG with the same running time as our MPC algorithms. At the core of all our algorithms we interleave the usage of three techniques: transitive sparsification, shrinking of a path cover, and the splicing of a set of paths along a given path.
△ Less
Submitted 12 July, 2021;
originally announced July 2021.
-
Algorithms and Complexity on Indexing Founder Graphs
Authors:
Massimo Equi,
Tuukka Norri,
Jarno Alanko,
Bastien Cazaux,
Alexandru I. Tomescu,
Veli Mäkinen
Abstract:
We study the problem of matching a string in a labeled graph. Previous research has shown that unless the Orthogonal Vectors Hypothesis (OVH) is false, one cannot solve this problem in strongly sub-quadratic time, nor index the graph in polynomial time to answer queries efficiently (Equi et al. ICALP 2019, SOFSEM 2021). These conditional lower-bounds cover even deterministic graphs with binary alp…
▽ More
We study the problem of matching a string in a labeled graph. Previous research has shown that unless the Orthogonal Vectors Hypothesis (OVH) is false, one cannot solve this problem in strongly sub-quadratic time, nor index the graph in polynomial time to answer queries efficiently (Equi et al. ICALP 2019, SOFSEM 2021). These conditional lower-bounds cover even deterministic graphs with binary alphabet, but there naturally exist also graph classes that are easy to index: E.g. Wheeler graphs (Gagie et al. Theor. Comp. Sci. 2017) cover graphs admitting a Burrows-Wheeler transform -based indexing scheme. However, it is NP-complete to recognize if a graph is a Wheeler graph (Gibney, Thankachan, ESA 2019).
We propose an approach to alleviate the construction bottleneck of Wheeler graphs. Rather than starting from an arbitrary graph, we study graphs induced from multiple sequence alignments (MSAs). Elastic degenerate strings (Bernadini et al. SPIRE 2017, ICALP 2019) can be seen as such graphs, and we introduce here their generalization: elastic founder graphs. We first prove that even such induced graphs are hard to index under OVH. Then we introduce two subclasses, repeat-free and semi-repeat-free graphs, that are easy to index. We give a linear time algorithm to construct a repeat-free non-elastic founder graph from a gapless MSA, and (parameterized) near-linear time algorithms to construct semi-repeat-free (repeat-free, respectively) elastic founder graphs from general MSAs. Finally, we show that repeat-free elastic founder graphs admit a reduction to Wheeler graphs in polynomial time.
△ Less
Submitted 10 June, 2022; v1 submitted 25 February, 2021;
originally announced February 2021.
-
Reaching Consensus for Asynchronous Distributed Key Generation
Authors:
Ittai Abraham,
Philipp Jovanovic,
Mary Maller,
Sarah Meiklejohn,
Gilad Stern,
Alin Tomescu
Abstract:
We give a protocol for Asynchronous Distributed Key Generation (A-DKG) that is optimally resilient (can withstand $f<\frac{n}{3}$ faulty parties), has a constant expected number of rounds, has $\tilde{O}(n^3)$ expected communication complexity, and assumes only the existence of a PKI. Prior to our work, the best A-DKG protocols required $Ω(n)$ expected number of rounds, and $Ω(n^4)$ expected commu…
▽ More
We give a protocol for Asynchronous Distributed Key Generation (A-DKG) that is optimally resilient (can withstand $f<\frac{n}{3}$ faulty parties), has a constant expected number of rounds, has $\tilde{O}(n^3)$ expected communication complexity, and assumes only the existence of a PKI. Prior to our work, the best A-DKG protocols required $Ω(n)$ expected number of rounds, and $Ω(n^4)$ expected communication.
Our A-DKG protocol relies on several building blocks that are of independent interest. We define and design a Proposal Election (PE) protocol that allows parties to retrospectively agree on a valid proposal after enough proposals have been sent from different parties. With constant probability the elected proposal was proposed by a non-faulty party. In building our PE protocol, we design a Verifiable Gather protocol which allows parties to communicate which proposals they have and have not seen in a verifiable manner. The final building block to our A-DKG is a Validated Asynchronous Byzantine Agreement (VABA) protocol. We use our PE protocol to construct a VABA protocol that does not require leaders or an asynchronous DKG setup. Our VABA protocol can be used more generally when it is not possible to use threshold signatures.
△ Less
Submitted 4 June, 2021; v1 submitted 17 February, 2021;
originally announced February 2021.
-
Optimizing Safe Flow Decompositions in DAGs
Authors:
Shahbaz Khan,
Alexandru I. Tomescu
Abstract:
Network flow is one of the most studied combinatorial optimization problems having innumerable applications. Any flow on a directed acyclic graph $G$ having $n$ vertices and $m$ edges can be decomposed into a set of $O(m)$ paths. In some applications, each solution (decomposition) corresponds to some particular data that generated the original flow. Given the possibility of multiple optimal soluti…
▽ More
Network flow is one of the most studied combinatorial optimization problems having innumerable applications. Any flow on a directed acyclic graph $G$ having $n$ vertices and $m$ edges can be decomposed into a set of $O(m)$ paths. In some applications, each solution (decomposition) corresponds to some particular data that generated the original flow. Given the possibility of multiple optimal solutions, no optimization criterion ensures the identification of the correct decomposition. Hence, recently flow decomposition was studied [RECOMB22] in the Safe and Complete framework, particularly for RNA Assembly.
They presented a characterization of the safe paths, resulting in an $O(mn+out_R)$ time algorithm to compute all safe paths, where $out_R$ is the size of the raw output reporting each safe path explicitly. They also showed that $out_R$ can be $Ω(mn^2)$ in the worst case but $O(m)$ in the best case. Hence, they further presented an algorithm to report a concise representation of the output $out_C$ in $O(mn+out_C)$ time, where $out_C$ can be $Ω(mn)$ in the worst case but $O(m)$ in the best case.
In this work, we study how different safe paths interact, resulting in optimal output-sensitive algorithms requiring $O(m+out_R)$ and $O(m+out_C)$ time for computing the existing representations of the safe paths. Further, we propose a new characterization of the safe paths resulting in the {\em optimal} representation of safe paths $out_O$, which can be $Ω(mn)$ in the worst case but requires optimal $O(1)$ space for every safe path reported, with a near-optimal computation algorithm.
Overall we further develop the theory of safe and complete solutions for the flow decomposition problem, giving an optimal algorithm for the explicit representation, and a near-optimal algorithm for the optimal representation of the safe paths
△ Less
Submitted 4 July, 2022; v1 submitted 12 February, 2021;
originally announced February 2021.
-
The Hydrostructure: a Universal Framework for Safe and Complete Algorithms for Genome Assembly
Authors:
Massimo Cairo,
Shahbaz Khan,
Romeo Rizzi,
Sebastian Schmidt,
Alexandru I. Tomescu,
Elia C. Zirondelli
Abstract:
Genome assembly is a fundamental problem in Bioinformatics, requiring to reconstruct a source genome from an assembly graph built from a set of reads (short strings sequenced from the genome). A notion of genome assembly solution is that of an arc-covering walk of the graph. Since assembly graphs admit many solutions, the goal is to find what is definitely present in all solutions, or what is safe…
▽ More
Genome assembly is a fundamental problem in Bioinformatics, requiring to reconstruct a source genome from an assembly graph built from a set of reads (short strings sequenced from the genome). A notion of genome assembly solution is that of an arc-covering walk of the graph. Since assembly graphs admit many solutions, the goal is to find what is definitely present in all solutions, or what is safe. Most practical assemblers are based on heuristics having at their core unitigs, namely paths whose internal nodes have unit in-degree and out-degree, and which are clearly safe. The long-standing open problem of finding all the safe parts of the solutions was recently solved [RECOMB 2016] yielding a 60% increase in contig length. This safe and complete genome assembly algorithm was followed by other works improving the time bounds, as well as extending the results for different notions of assembly solution. But it remained open whether one can be complete also for models of genome assembly of practical applicability.
In this paper we present a universal framework for obtaining safe and complete algorithms which unify the previous results, while also allowing for easy generalisations to assembly problems including many practical aspects. This is based on a novel graph structure, called the hydrostructure of a walk, which highlights the reachability properties of the graph from the perspective of the walk. The hydrostructure allows for simple characterisations of the existing safe walks, and of their new practical versions. Almost all of our characterisations are directly adaptable to optimal verification algorithms, and simple enumeration algorithms. Most of these algorithms are also improved to optimality using an incremental computation procedure and a previous optimal algorithm of a specific model.
△ Less
Submitted 2 November, 2021; v1 submitted 25 November, 2020;
originally announced November 2020.
-
A linear-time parameterized algorithm for computing the width of a DAG
Authors:
Manuel Cáceres,
Massimo Cairo,
Brendan Mumey,
Romeo Rizzi,
Alexandru I. Tomescu
Abstract:
The width $k$ of a directed acyclic graph (DAG) $G = (V, E)$ equals the largest number of pairwise non-reachable vertices. Computing the width dates back to Dilworth's and Fulkerson's results in the 1950s, and is doable in quadratic time in the worst case. Since $k$ can be small in practical applications, research has also studied algorithms whose complexity is parameterized on $k$. Despite these…
▽ More
The width $k$ of a directed acyclic graph (DAG) $G = (V, E)$ equals the largest number of pairwise non-reachable vertices. Computing the width dates back to Dilworth's and Fulkerson's results in the 1950s, and is doable in quadratic time in the worst case. Since $k$ can be small in practical applications, research has also studied algorithms whose complexity is parameterized on $k$. Despite these efforts, it is still open whether there exists a linear-time $O(f(k)(|V| + |E|))$ parameterized algorithm computing the width. We answer this question affirmatively by presenting an $O(k^24^k|V| + k2^k|E|)$ time algorithm, based on a new notion of frontier antichains. As we process the vertices in a topological order, all frontier antichains can be maintained with the help of several combinatorial properties, paying only $f(k)$ along the way. The fact that the width can be computed by a single $f(k)$-sweep of the DAG is a new surprising insight into this classical problem. Our algorithm also allows deciding whether the DAG has width at most $w$ in time $O(f(\min(w,k))(|V|+|E|))$.
△ Less
Submitted 24 June, 2021; v1 submitted 15 July, 2020;
originally announced July 2020.
-
Safety in $s$-$t$ Paths, Trails and Walks
Authors:
Massimo Cairo,
Shahbaz Khan,
Romeo Rizzi,
Sebastian Schmidt,
Alexandru I. Tomescu
Abstract:
Given a directed graph $G$ and a pair of nodes $s$ and $t$, an \emph{$s$-$t$ bridge} of $G$ is an edge whose removal breaks all $s$-$t$ paths of $G$ (and thus appears in all $s$-$t$ paths). Computing all $s$-$t$ bridges of $G$ is a basic graph problem, solvable in linear time.
In this paper, we consider a natural generalisation of this problem, with the notion of "safety" from bioinformatics. We…
▽ More
Given a directed graph $G$ and a pair of nodes $s$ and $t$, an \emph{$s$-$t$ bridge} of $G$ is an edge whose removal breaks all $s$-$t$ paths of $G$ (and thus appears in all $s$-$t$ paths). Computing all $s$-$t$ bridges of $G$ is a basic graph problem, solvable in linear time.
In this paper, we consider a natural generalisation of this problem, with the notion of "safety" from bioinformatics. We say that a walk $W$ is \emph{safe} with respect to a set $\mathcal{W}$ of $s$-$t$ walks, if $W$ is a subwalk of all walks in $\mathcal{W}$. We start by considering the maximal safe walks when $\mathcal{W}$ consists of: all $s$-$t$ paths, all $s$-$t$ trails, or all $s$-$t$ walks of $G$. We show that the first two problems are immediate linear-time generalisations of finding all $s$-$t$ bridges, while the third problem is more involved. In particular, we show that there exists a compact representation computable in linear time, that allows outputting all maximal safe walks in time linear in their length.
We further generalise these problems, by assuming that safety is defined only with respect to a subset of \emph{visible} edges. Here we prove a dichotomy between the $s$-$t$ paths and $s$-$t$ trails cases, and the $s$-$t$ walks case: the former two are NP-hard, while the latter is solvable with the same complexity as when all edges are visible. We also show that the same complexity results hold for the analogous generalisations of \emph{$s$-$t$ articulation points} (nodes appearing in all $s$-$t$ paths).
We thus obtain the best possible results for natural "safety"-generalisations of these two fundamental graph problems. Moreover, our algorithms are simple and do not employ any complex data structures, making them ideal for use in practice.
△ Less
Submitted 17 July, 2020; v1 submitted 9 July, 2020;
originally announced July 2020.
-
Computing all $s$-$t$ bridges and articulation points simplified
Authors:
Massimo Cairo,
Shahbaz Khan,
Romeo Rizzi,
Sebastian Schmidt,
Alexandru I. Tomescu,
Elia Zirondelli
Abstract:
Given a directed graph $G$ and a pair of nodes $s$ and $t$, an $s$-$t$ bridge of $G$ is an edge whose removal breaks all $s$-$t$ paths of $G$. Similarly, an $s$-$t$ articulation point of $G$ is a node whose removal breaks all $s$-$t$ paths of $G$. Computing the sequence of all $s$-$t$ bridges of $G$ (as well as the $s$-$t$ articulation points) is a basic graph problem, solvable in linear time usin…
▽ More
Given a directed graph $G$ and a pair of nodes $s$ and $t$, an $s$-$t$ bridge of $G$ is an edge whose removal breaks all $s$-$t$ paths of $G$. Similarly, an $s$-$t$ articulation point of $G$ is a node whose removal breaks all $s$-$t$ paths of $G$. Computing the sequence of all $s$-$t$ bridges of $G$ (as well as the $s$-$t$ articulation points) is a basic graph problem, solvable in linear time using the classical min-cut algorithm.
When dealing with cuts of unit size ($s$-$t$ bridges) this algorithm can be simplified to a single graph traversal from $s$ to $t$ avoiding an arbitrary $s$-$t$ path, which is interrupted at the $s$-$t$ bridges. Further, the corresponding proof is also simplified making it independent of the theory of network flows.
△ Less
Submitted 26 June, 2020;
originally announced June 2020.
-
Linear Time Construction of Indexable Founder Block Graphs
Authors:
Veli Mäkinen,
Bastien Cazaux,
Massimo Equi,
Tuukka Norri,
Alexandru I. Tomescu
Abstract:
We introduce a compact pangenome representation based on an optimal segmentation concept that aims to reconstruct founder sequences from a multiple sequence alignment (MSA). Such founder sequences have the feature that each row of the MSA is a recombination of the founders. Several linear time dynamic programming algorithms have been previously devised to optimize segmentations that induce founder…
▽ More
We introduce a compact pangenome representation based on an optimal segmentation concept that aims to reconstruct founder sequences from a multiple sequence alignment (MSA). Such founder sequences have the feature that each row of the MSA is a recombination of the founders. Several linear time dynamic programming algorithms have been previously devised to optimize segmentations that induce founder blocks that then can be concatenated into a set of founder sequences. All possible concatenation orders can be expressed as a founder block graph. We observe a key property of such graphs: if the node labels (founder segments) do not repeat in the paths of the graph, such graphs can be indexed for efficient string matching. We call such graphs segment repeat-free founder block graphs.
We give a linear time algorithm to construct a segment repeat-free founder block graph given an MSA. The algorithm combines techniques from the founder segmentation algorithms (Cazaux et al. SPIRE 2019) and fully-functional bidirectional Burrows-Wheeler index (Belazzougui and Cunial, CPM 2019). We derive a succinct index structure to support queries of arbitrary length in the paths of the graph.
Experiments on an MSA of SAR-CoV-2 strains are reported. An MSA of size $410\times 29811$ is compacted in one minute into a segment repeat-free founder block graph of 3900 nodes and 4440 edges. The maximum length and total length of node labels is 12 and 34968, respectively. The index on the graph takes only $3\%$ of the size of the MSA.
△ Less
Submitted 19 May, 2020;
originally announced May 2020.
-
Genome assembly, from practice to theory: safe, complete and linear-time
Authors:
Massimo Cairo,
Romeo Rizzi,
Alexandru I. Tomescu,
Elia C. Zirondelli
Abstract:
Genome assembly asks to reconstruct an unknown string from many shorter substrings of it. Even though it is one of the key problems in Bioinformatics, it is generally lacking major theoretical advances. Its hardness stems both from practical issues (size and errors of real data), and from the fact that problem formulations inherently admit multiple solutions. Given these, at their core, most state…
▽ More
Genome assembly asks to reconstruct an unknown string from many shorter substrings of it. Even though it is one of the key problems in Bioinformatics, it is generally lacking major theoretical advances. Its hardness stems both from practical issues (size and errors of real data), and from the fact that problem formulations inherently admit multiple solutions. Given these, at their core, most state-of-the-art assemblers are based on finding non-branching paths (unitigs) in an assembly graph. If one defines a genome assembly solution as a closed arc-covering walk of the graph, then unitigs appear in all solutions, being thus safe partial solutions. All all such safe walks were recently characterized as omnitigs, leading to the first safe and complete genome assembly algorithm. Even if omnitig finding was improved to quadratic time, it remained open whether the crucial linear-time feature of finding unitigs can be attained with omnitigs.
We describe a surprising $O(m)$-time algorithm to identify all maximal omnitigs of a graph with $n$ nodes and $m$ arcs, notwithstanding the existence of families of graphs with $Θ(mn)$ total maximal omnitig size. This is based on the discovery of a family of walks (macrotigs) with the property that all the non-trivial omnitigs are univocal extensions of subwalks of a macrotig, with two consequences: (1) A linear-time output-sensitive algorithm enumerating all maximal omnitigs. (2) A compact $O(m)$ representation of all maximal omnitigs, which allows, e.g., for $O(m)$-time computation of various statistics on them.
Our results close a long-standing theoretical question inspired by practical genome assemblers, originating with the use of unitigs in 1995. We envision our results to be at the core of a reverse transfer from theory to practical and complete genome assembly programs, as has been the case for other key Bioinformatics problems.
△ Less
Submitted 8 November, 2020; v1 submitted 24 February, 2020;
originally announced February 2020.
-
Graphs cannot be indexed in polynomial time for sub-quadratic time string matching, unless SETH fails
Authors:
Massimo Equi,
Veli Mäkinen,
Alexandru I. Tomescu
Abstract:
We consider the following string matching problem on a node-labeled graph $G=(V,E)$: given a pattern string $P$, decide whether there exists a path in $G$ whose concatenation of node labels equals $P$. This is a basic primitive in various problems in bioinformatics, graph databases, or networks. The hardness results of Backurs and Indyk (FOCS 2016) imply that this problem cannot be solved in bette…
▽ More
We consider the following string matching problem on a node-labeled graph $G=(V,E)$: given a pattern string $P$, decide whether there exists a path in $G$ whose concatenation of node labels equals $P$. This is a basic primitive in various problems in bioinformatics, graph databases, or networks. The hardness results of Backurs and Indyk (FOCS 2016) imply that this problem cannot be solved in better than $O(|E||P|)$ time, under the Orthogonal Vectors Hypothesis (OVH), and this holds even under various restrictions on the graph (Equi et al., ICALP 2019).
In this paper we consider its offline version, namely the one in which we are allowed to index the graph in order to support time-efficient string matching queries. Indeed, it was tantalizing in the string matching community to believe that sub-quadratic time queries can be achieved, e.g. at the cost of a high-degree polynomial-time indexing.
We disprove this belief, showing that, under OVH, no polynomial-time index can support querying $P$ in time $O(|E|^δ|P|^β)$, with either $δ< 1$ or $β< 1$. We prove this tight bound employing a known self-reducibility technique, e.g. from the field of dynamic algorithms, which translates conditional lower bounds for an online problem to its offline version.
As a side-contribution, we formalize this technique with the notion of linear independent-components reduction, allowing for a simple proof of our result. As another illustration of our technique, we also translate the quadratic conditional lower bound of Backurs and Indyk (STOC 2015) for the problem of matching a query string inside a text, under edit distance. We obtain an analogous tight quadratic lower bound for its offline version, improving the recent result of Cohen-Addad, Feuilloley and Starikovskaya (SODA 2019), but with a slightly different boundary condition.
△ Less
Submitted 4 March, 2020; v1 submitted 3 February, 2020;
originally announced February 2020.
-
On the Complexity of Exact Pattern Matching in Graphs: Determinism and Zig-Zag Matching
Authors:
Massimo Equi,
Roberto Grossi,
Alexandru I. Tomescu,
Veli Mäkinen
Abstract:
Exact pattern matching in labeled graphs is the problem of searching paths of a graph $G=(V,E)$ that spell the same string as the given pattern $P[1..m]$. This basic problem can be found at the heart of more complex operations on variation graphs in computational biology, query operations in graph databases, and analysis of heterogeneous networks, where the nodes of some paths must match a sequenc…
▽ More
Exact pattern matching in labeled graphs is the problem of searching paths of a graph $G=(V,E)$ that spell the same string as the given pattern $P[1..m]$. This basic problem can be found at the heart of more complex operations on variation graphs in computational biology, query operations in graph databases, and analysis of heterogeneous networks, where the nodes of some paths must match a sequence of labels or types. In our recent work we described a conditional lower bound stating that the exact pattern matching problem in labeled graphs cannot be solved in less than quadratic time, namely, $O(|E|^{1 - ε} \, m)$ time or $O(|E| \, m^{1 - ε})$ time for any constant $ε>0$, unless the Strong Exponential Time Hypothesis (SETH) is false. The result holds even if node labels and pattern $P$ are drawn from a binary alphabet, and $G$ is restricted to undirected graphs of maximum degree three or directed acyclic graphs of maximum sum of indegree and outdegree three. It was left open what happens on undirected graphs of maximum degree two, i.e., when the pattern can have a zig-zag match in a (cyclic) bidirectional string. Also, the reduction created a non-determistic directed acyclic graph, and it was left open if determinism would make the problem easier. In this work, we show through the Orthogonal Vectors hypothesis (OV) that the same conditional lower bound holds even for these restricted cases.
△ Less
Submitted 10 February, 2019;
originally announced February 2019.
-
sAVSS: Scalable Asynchronous Verifiable Secret Sharing in BFT Protocols
Authors:
Soumya Basu,
Alin Tomescu,
Ittai Abraham,
Dahlia Malkhi,
Michael K. Reiter,
Emin Gün Sirer
Abstract:
This paper introduces a new way to incorporate verifiable secret sharing (VSS) schemes into Byzantine Fault Tolerance (BFT) protocols. This technique extends the threshold guarantee of classical Byzantine Fault Tolerant algorithms to include privacy as well. This provides applications with a powerful primitive: a threshold trusted third party, which simplifies many difficult problems such as a fai…
▽ More
This paper introduces a new way to incorporate verifiable secret sharing (VSS) schemes into Byzantine Fault Tolerance (BFT) protocols. This technique extends the threshold guarantee of classical Byzantine Fault Tolerant algorithms to include privacy as well. This provides applications with a powerful primitive: a threshold trusted third party, which simplifies many difficult problems such as a fair exchange. In order to incorporate VSS into BFT, we introduced sAVSS, a framework that transforms any VSS scheme into an asynchronous VSS scheme with constant overhead. By incorporating Kate et al.'s scheme into our framework, we obtain an asynchronous VSS that has constant overhead on each replica -- the first of its kind. We show that a key-value store built using BFT replication and sAVSS supports writing secret-shared values with about a 30% - 50% throughput overhead with less than 35 millisecond request latencies.
△ Less
Submitted 21 December, 2018; v1 submitted 10 July, 2018;
originally announced July 2018.
-
SBFT: a Scalable and Decentralized Trust Infrastructure
Authors:
Guy Golan Gueta,
Ittai Abraham,
Shelly Grossman,
Dahlia Malkhi,
Benny Pinkas,
Michael K. Reiter,
Dragos-Adrian Seredinschi,
Orr Tamir,
Alin Tomescu
Abstract:
SBFT is a state of the art Byzantine fault tolerant permissioned blockchain system that addresses the challenges of scalability, decentralization and world-scale geo-replication. SBFTis optimized for decentralization and can easily handle more than 200 active replicas in a real world-scale deployment. We evaluate \sysname in a world-scale geo-replicated deployment with 209 replicas withstanding f=…
▽ More
SBFT is a state of the art Byzantine fault tolerant permissioned blockchain system that addresses the challenges of scalability, decentralization and world-scale geo-replication. SBFTis optimized for decentralization and can easily handle more than 200 active replicas in a real world-scale deployment. We evaluate \sysname in a world-scale geo-replicated deployment with 209 replicas withstanding f=64 Byzantine failures. We provide experiments that show how the different algorithmic ingredients of \sysname increase its performance and scalability. The results show that SBFT simultaneously provides almost 2x better throughput and about 1.5x better latency relative to a highly optimized system that implements the PBFT protocol. To achieve this performance improvement, SBFT uses a combination of four ingredients: using collectors and threshold signatures to reduce communication to linear, using an optimistic fast path, reducing client communication and utilizing redundant servers for the fast path.
△ Less
Submitted 2 January, 2019; v1 submitted 4 April, 2018;
originally announced April 2018.
-
Using Minimum Path Cover to Boost Dynamic Programming on DAGs: Co-Linear Chaining Extended
Authors:
Anna Kuosmanen,
Topi Paavilainen,
Travis Gagie,
Rayan Chikhi,
Alexandru I. Tomescu,
Veli Mäkinen
Abstract:
Aligning sequencing reads on graph representations of genomes is an important ingredient of pan-genomics. Such approaches typically find a set of local anchors that indicate plausible matches between substrings of a read to subpaths of the graph. These anchor matches are then combined to form a (semi-local) alignment of the complete read on a subpath. Co-linear chaining is an algorithmically rigor…
▽ More
Aligning sequencing reads on graph representations of genomes is an important ingredient of pan-genomics. Such approaches typically find a set of local anchors that indicate plausible matches between substrings of a read to subpaths of the graph. These anchor matches are then combined to form a (semi-local) alignment of the complete read on a subpath. Co-linear chaining is an algorithmically rigorous approach to combine the anchors. It is a well-known approach for the case of two sequences as inputs. Here we extend the approach so that one of the inputs can be a directed acyclic graph (DAGs), e.g. a splicing graph in transcriptomics or a variant graph in pan-genomics.
This extension to DAGs turns out to have a tight connection to the minimum path cover problem, asking for a minimum-cardinality set of paths that cover all the nodes of a DAG. We study the case when the size $k$ of a minimum path cover is small, which is often the case in practice. First, we propose an algorithm for finding a minimum path cover of a DAG $(V,E)$ in $O(k|E|\log|V|)$ time, improving all known time-bounds when $k$ is small and the DAG is not too dense. Second, we introduce a general technique for extending dynamic programming (DP) algorithms from sequences to DAGs. This is enabled by our minimum path cover algorithm, and works by mimicking the DP algorithm for sequences on each path of the minimum path cover. This technique generally produces algorithms that are slower than their counterparts on sequences only by a factor $k$. Our technique can be applied, for example, to the classical longest increasing subsequence and longest common subsequence problems, extended to labeled DAGs. Finally, we apply this technique to the co-linear chaining problem. We also implemented the new co-linear chaining approach. Experiments on splicing graphs show that the new method is efficient also in practice.
△ Less
Submitted 29 January, 2018; v1 submitted 24 May, 2017;
originally announced May 2017.
-
Perfect phylogenies via branchings in acyclic digraphs and a generalization of Dilworth's theorem
Authors:
Ademir Hujdurović,
Edin Husić,
Martin Milanič,
Romeo Rizzi,
Alexandru I. Tomescu
Abstract:
Motivated by applications in cancer genomics and following the work of Hajirasouliha and Raphael (WABI 2014), Hujdurović et al. (IEEE TCBB, to appear) introduced the minimum conflict-free row split (MCRS) problem: split each row of a given binary matrix into a bitwise OR of a set of rows so that the resulting matrix corresponds to a perfect phylogeny and has the minimum possible number of rows amo…
▽ More
Motivated by applications in cancer genomics and following the work of Hajirasouliha and Raphael (WABI 2014), Hujdurović et al. (IEEE TCBB, to appear) introduced the minimum conflict-free row split (MCRS) problem: split each row of a given binary matrix into a bitwise OR of a set of rows so that the resulting matrix corresponds to a perfect phylogeny and has the minimum possible number of rows among all matrices with this property. Hajirasouliha and Raphael also proposed the study of a similar problem, in which the task is to minimize the number of distinct rows of the resulting matrix. Hujdurović et al. proved that both problems are NP-hard, gave a related characterization of transitively orientable graphs, and proposed a polynomial-time heuristic algorithm for the MCRS problem based on coloring cocomparability graphs.
We give new, more transparent formulations of the two problems, showing that the problems are equivalent to two optimization problems on branchings in a derived directed acyclic graph. Building on these formulations, we obtain new results on the two problems, including: (i) a strengthening of the heuristic by Hujdurović et al. via a new min-max result in digraphs generalizing Dilworth's theorem, which may be of independent interest, (ii) APX-hardness results for both problems, (iii) approximation algorithms, and (iv) exponential-time algorithms solving the two problems to optimality faster than the naïve brute-force approach. Our work relates to several well studied notions in combinatorial optimization: chain partitions in partially ordered sets, laminar hypergraphs, and (classical and weighted) colorings of graphs.
△ Less
Submitted 27 January, 2018; v1 submitted 19 January, 2017;
originally announced January 2017.
-
Hardness of Covering Alignment: Phase Transition in Post-Sequence Genomics
Authors:
Romeo Rizzi,
Massimo Cairo,
Veli Mäkinen,
Alexandru I. Tomescu,
Daniel Valenzuela
Abstract:
Covering alignment problems arise from recent developments in genomics; so called pan-genome graphs are replacing reference genomes, and advances in haplotyping enable full content of diploid genomes to be used as basis of sequence analysis. In this paper, we show that the computational complexity will change for natural extensions of alignments to pan-genome representations and to diploid genomes…
▽ More
Covering alignment problems arise from recent developments in genomics; so called pan-genome graphs are replacing reference genomes, and advances in haplotyping enable full content of diploid genomes to be used as basis of sequence analysis. In this paper, we show that the computational complexity will change for natural extensions of alignments to pan-genome representations and to diploid genomes. More broadly, our approach can also be seen as a minimal extension of sequence alignment to labelled directed acyclic graphs (labeled DAGs). Namely, we show that finding a \emph{covering alignment} of two labeled DAGs is NP-hard even on binary alphabets. A covering alignment asks for two paths $R_1$ (red) and $G_1$ (green) in DAG $D_1$ and two paths $R_2$ (red) and $G_2$ (green) in DAG $D_2$ that cover the nodes of the graphs and maximize the sum of the global alignment scores: $\mathsf{as}(\mathsf{sp}(R_1),\mathsf{sp}(R_2))+\mathsf{as}(\mathsf{sp}(G_1),\mathsf{sp}(G_2))$, where $\mathsf{sp}(P)$ is the concatenation of labels on the path $P$. Pair-wise alignment of haplotype sequences forming a diploid chromosome can be converted to a two-path coverable labelled DAG, and then the covering alignment models the similarity of two diploids over arbitrary recombinations. We also give a reduction to the other direction, to show that such a recombination-oblivious diploid alignment is NP-hard on alphabets of size $3$.
△ Less
Submitted 22 May, 2018; v1 submitted 15 November, 2016;
originally announced November 2016.
-
Safe and complete contig assembly via omnitigs
Authors:
Alexandru I. Tomescu,
Paul Medvedev
Abstract:
Contig assembly is the first stage that most assemblers solve when reconstructing a genome from a set of reads. Its output consists of contigs -- a set of strings that are promised to appear in any genome that could have generated the reads. From the introduction of contigs 20 years ago, assemblers have tried to obtain longer and longer contigs, but the following question was never solved: given a…
▽ More
Contig assembly is the first stage that most assemblers solve when reconstructing a genome from a set of reads. Its output consists of contigs -- a set of strings that are promised to appear in any genome that could have generated the reads. From the introduction of contigs 20 years ago, assemblers have tried to obtain longer and longer contigs, but the following question was never solved: given a genome graph $G$ (e.g. a de Bruijn, or a string graph), what are all the strings that can be safely reported from $G$ as contigs? In this paper we finally answer this question, and also give a polynomial time algorithm to find them. Our experiments show that these strings, which we call omnitigs, are 66% to 82% longer on average than the popular unitigs, and 29% of dbSNP locations have more neighbors in omnitigs than in unitigs.
△ Less
Submitted 16 August, 2016; v1 submitted 12 January, 2016;
originally announced January 2016.
-
Interval scheduling maximizing minimum coverage
Authors:
Veli Mäkinen,
Valeria Staneva,
Alexandru Tomescu,
Daniel Valenzuela
Abstract:
In the classical interval scheduling type of problems, a set of $n$ jobs, characterized by their start and end time, need to be executed by a set of machines, under various constraints. In this paper we study a new variant in which the jobs need to be assigned to at most $k$ identical machines, such that the minimum number of machines that are busy at the same time is maximized. This is relevant i…
▽ More
In the classical interval scheduling type of problems, a set of $n$ jobs, characterized by their start and end time, need to be executed by a set of machines, under various constraints. In this paper we study a new variant in which the jobs need to be assigned to at most $k$ identical machines, such that the minimum number of machines that are busy at the same time is maximized. This is relevant in the context of genome sequencing and haplotyping, specifically when a set of DNA reads aligned to a genome needs to be pruned so that no more than $k$ reads overlap, while maintaining as much read coverage as possible across the entire genome. We show that the problem can be solved in time $\min\left(O(n^2\log k / \log n),O(nk\log k)\right)$ by using max-flows. We also give an $O(n\log n)$-time approximation algorithm with approximation ratio $ρ=\frac{k}{\lfloor k/2 \rfloor}$.
△ Less
Submitted 30 October, 2015; v1 submitted 31 August, 2015;
originally announced August 2015.
-
Complexity and algorithms for finding a perfect phylogeny from mixed tumor samples
Authors:
Ademir Hujdurović,
Urša Kačar,
Martin Milanič,
Bernard Ries,
Alexandru I. Tomescu
Abstract:
Recently, Hajirasouliha and Raphael (WABI 2014) proposed a model for deconvoluting mixed tumor samples measured from a collection of high-throughput sequencing reads. This is related to understanding tumor evolution and critical cancer mutations. In short, their formulation asks to split each row of a binary matrix so that the resulting matrix corresponds to a perfect phylogeny and has the minimum…
▽ More
Recently, Hajirasouliha and Raphael (WABI 2014) proposed a model for deconvoluting mixed tumor samples measured from a collection of high-throughput sequencing reads. This is related to understanding tumor evolution and critical cancer mutations. In short, their formulation asks to split each row of a binary matrix so that the resulting matrix corresponds to a perfect phylogeny and has the minimum number of rows among all matrices with this property. In this paper we disprove several claims about this problem, including an NP-hardness proof of it. However, we show that the problem is indeed NP-hard, by providing a different proof. We also prove NP-completeness of a variant of this problem proposed in the same paper. On the positive side, we propose an efficient (though not necessarily optimal) heuristic algorithm based on coloring co-comparability graphs, and a polynomial time algorithm for solving the problem optimally on matrix instances in which no column is contained in both columns of a pair of conflicting columns. Implementations of these algorithms are freely available at https://github.com/alexandrutomescu/MixedPerfectPhylogeny
△ Less
Submitted 7 July, 2016; v1 submitted 25 June, 2015;
originally announced June 2015.
-
Enumeration of the adjunctive hierarchy of hereditarily finite sets
Authors:
Giorgio Audrito,
Alexandru I. Tomescu,
Stephan Wagner
Abstract:
Hereditarily finite sets (sets which are finite and have only hereditarily finite sets as members) are basic mathematical and computational objects, and also stand at the basis of some programming languages. This raises the need for efficient representation of such sets, for example by numbers. In 2008, Kirby proposed an adjunctive hierarchy of hereditarily finite sets, based on the fact that they…
▽ More
Hereditarily finite sets (sets which are finite and have only hereditarily finite sets as members) are basic mathematical and computational objects, and also stand at the basis of some programming languages. This raises the need for efficient representation of such sets, for example by numbers. In 2008, Kirby proposed an adjunctive hierarchy of hereditarily finite sets, based on the fact that they can also be seen as built up from the empty set by repeated adjunction, that is, by the addition of a new single element drawn from the already existing sets to an already existing set. Determining the cardinality $a_n$ of each level of this hierarchy, problem crucial in establishing whether the natural adjunctive hierarchy leads to an efficient encoding by numbers, was left open.
In this paper we solve this problem. Our results can be generalized to hereditarily finite sets with atoms, or can be further refined by imposing restrictions on rank, on cardinality, or on the maximum level from where the new adjoined element can be drawn. We also show that $a_n$ satisfies the asymptotic formula $a_n = C^{2^n} + O(C^{2^{n-1}})$, for a constant $C \approx 1.3399$, which is a too fast asymptotic growth for practical purposes. We thus propose a very natural variant of the adjunctive hierarchy, whose asymptotic behavior we prove to be $Θ(2^n)$. To our knowledge, this is the first result of this kind.
△ Less
Submitted 9 April, 2014; v1 submitted 10 September, 2013;
originally announced September 2013.
-
A Novel Combinatorial Method for Estimating Transcript Expression with RNA-Seq: Bounding the Number of Paths
Authors:
Alexandru I. Tomescu,
Anna Kuosmanen,
Romeo Rizzi,
Veli Mäkinen
Abstract:
RNA-Seq technology offers new high-throughput ways for transcript identification and quantification based on short reads, and has recently attracted great interest. The problem is usually modeled by a weighted splicing graph whose nodes stand for exons and whose edges stand for split alignments to the exons. The task consists of finding a number of paths, together with their expression levels, whi…
▽ More
RNA-Seq technology offers new high-throughput ways for transcript identification and quantification based on short reads, and has recently attracted great interest. The problem is usually modeled by a weighted splicing graph whose nodes stand for exons and whose edges stand for split alignments to the exons. The task consists of finding a number of paths, together with their expression levels, which optimally explain the coverages of the graph under various fitness functions, such least sum of squares. In (Tomescu et al. RECOMB-seq 2013) we showed that under general fitness functions, if we allow a polynomially bounded number of paths in an optimal solution, this problem can be solved in polynomial time by a reduction to a min-cost flow program. In this paper we further refine this problem by asking for a bounded number k of paths that optimally explain the splicing graph. This problem becomes NP-hard in the strong sense, but we give a fast combinatorial algorithm based on dynamic programming for it. In order to obtain a practical tool, we implement three optimizations and heuristics, which achieve better performance on real data, and similar or better performance on simulated data, than state-of-the-art tools Cufflinks, IsoLasso and SLIDE. Our tool, called Traph, is available at http://www.cs.helsinki.fi/gsa/traph/
△ Less
Submitted 30 July, 2013;
originally announced July 2013.
-
Combinatorial decomposition approaches for efficient counting and random generation FPTASes
Authors:
Romeo Rizzi,
Alexandru I. Tomescu
Abstract:
Given a combinatorial decomposition for a counting problem, we resort to the simple scheme of approximating large numbers by floating-point representations in order to obtain efficient Fully Polynomial Time Approximation Schemes (FPTASes) for it. The number of bits employed for the exponent and the mantissa will depend on the error parameter $0 < \varepsilon \leq 1$ and on the characteristics of t…
▽ More
Given a combinatorial decomposition for a counting problem, we resort to the simple scheme of approximating large numbers by floating-point representations in order to obtain efficient Fully Polynomial Time Approximation Schemes (FPTASes) for it. The number of bits employed for the exponent and the mantissa will depend on the error parameter $0 < \varepsilon \leq 1$ and on the characteristics of the problem. Accordingly, we propose the first FPTASes with $1 \pm \varepsilon$ relative error for counting and generating uniformly at random a labeled DAG with a given number of vertices. This is accomplished starting from a classical recurrence for counting DAGs, whose values we approximate by floating-point numbers.
After extending these results to other families of DAGs, we show how the same approach works also with problems where we are given a compact representation of a combinatorial ensemble and we are asked to count and sample elements from it. We employ here the floating-point approximation method to transform the classic pseudo-polynomial algorithm for counting 0/1 Knapsack solutions into a very simple FPTAS with $1 - \varepsilon$ relative error. Its complexity improves upon the recent result (Štefankovič et al., SIAM J. Comput., 2012), and, when $\varepsilon^{-1} = Ω(n)$, also upon the best-known randomized algorithm (Dyer, STOC, 2003). To show the versatility of this technique, we also apply it to a recent generalization of the problem of counting 0/1 Knapsack solutions in an arc-weighted DAG, obtaining a faster and simpler FPTAS than the existing one.
△ Less
Submitted 15 November, 2013; v1 submitted 9 July, 2013;
originally announced July 2013.
-
Motif matching using gapped patterns
Authors:
Emanuele Giaquinta,
Kimmo Fredriksson,
Szymon Grabowski,
Alexandru I. Tomescu,
Esko Ukkonen
Abstract:
We present new algorithms for the problem of multiple string matching of gapped patterns, where a gapped pattern is a sequence of strings such that there is a gap of fixed length between each two consecutive strings. The problem has applications in the discovery of transcription factor binding sites in DNA sequences when using generalized versions of the Position Weight Matrix model to describe tr…
▽ More
We present new algorithms for the problem of multiple string matching of gapped patterns, where a gapped pattern is a sequence of strings such that there is a gap of fixed length between each two consecutive strings. The problem has applications in the discovery of transcription factor binding sites in DNA sequences when using generalized versions of the Position Weight Matrix model to describe transcription factor specificities. In these models a motif can be matched as a set of gapped patterns with unit-length keywords. The existing algorithms for matching a set of gapped patterns are worst-case efficient but not practical, or vice versa, in this particular case. The novel algorithms that we present are based on dynamic programming and bit-parallelism, and lie in a middle-ground among the existing algorithms. In fact, their time complexity is close to the best existing bound and, yet, they are also practical. We also provide experimental results which show that the presented algorithms are fast in practice, and preferable if all the strings in the patterns have unit-length.
△ Less
Submitted 7 July, 2014; v1 submitted 11 June, 2013;
originally announced June 2013.
-
Indexes for Jumbled Pattern Matching in Strings, Trees and Graphs
Authors:
Ferdinando Cicalese,
Travis Gagie,
Emanuele Giaquinta,
Eduardo Sany Laber,
Zsuzsanna Lipták,
Romeo Rizzi,
Alexandru I. Tomescu
Abstract:
We consider how to index strings, trees and graphs for jumbled pattern matching when we are asked to return a match if one exists. For example, we show how, given a tree containing two colours, we can build a quadratic-space index with which we can find a match in time proportional to the size of the match. We also show how we need only linear space if we are content with approximate matches.
We consider how to index strings, trees and graphs for jumbled pattern matching when we are asked to return a match if one exists. For example, we show how, given a tree containing two colours, we can build a quadratic-space index with which we can find a match in time proportional to the size of the match. We also show how we need only linear space if we are content with approximate matches.
△ Less
Submitted 19 April, 2013;
originally announced April 2013.
-
Graph Operations on Parity Games and Polynomial-Time Algorithms
Authors:
Christoph Dittmann,
Stephan Kreutzer,
Alexandru I. Tomescu
Abstract:
Parity games are games that are played on directed graphs whose vertices are labeled by natural numbers, called priorities. The players push a token along the edges of the digraph. The winner is determined by the parity of the greatest priority occurring infinitely often in this infinite play.
A motivation for studying parity games comes from the area of formal verification of systems by model c…
▽ More
Parity games are games that are played on directed graphs whose vertices are labeled by natural numbers, called priorities. The players push a token along the edges of the digraph. The winner is determined by the parity of the greatest priority occurring infinitely often in this infinite play.
A motivation for studying parity games comes from the area of formal verification of systems by model checking. Deciding the winner in a parity game is polynomial time equivalent to the model checking problem of the modal mu-calculus. Another strong motivation lies in the fact that the exact complexity of solving parity games is a long-standing open problem, the currently best known algorithm being subexponential. It is known that the problem is in the complexity classes UP and coUP.
In this paper we identify restricted classes of digraphs where the problem is solvable in polynomial time, following an approach from structural graph theory. We consider three standard graph operations: the join of two graphs, repeated pasting along vertices, and the addition of a vertex. Given a class C of digraphs on which we can solve parity games in polynomial time, we show that the same holds for the class obtained from C by applying once any of these three operations to its elements.
These results provide, in particular, polynomial time algorithms for parity games whose underlying graph is an orientation of a complete graph, a complete bipartite graph, a block graph, or a block-cactus graph. These are classes where the problem was not known to be efficiently solvable.
Previous results concerning restricted classes of parity games which are solvable in polynomial time include classes of bounded tree-width, bounded DAG-width, and bounded clique-width.
We also prove that recognising the winning regions of a parity game is not easier than computing them from scratch.
△ Less
Submitted 8 August, 2012;
originally announced August 2012.
-
Set graphs. II. Complexity of set graph recognition and similar problems
Authors:
Martin Milanič,
Romeo Rizzi,
Alexandru I. Tomescu
Abstract:
A graph $G$ is said to be a `set graph' if it admits an acyclic orientation that is also `extensional', in the sense that the out-neighborhoods of its vertices are pairwise distinct. Equivalently, a set graph is the underlying graph of the digraph representation of a hereditarily finite set. In this paper, we continue the study of set graphs and related topics, focusing on computational complexity…
▽ More
A graph $G$ is said to be a `set graph' if it admits an acyclic orientation that is also `extensional', in the sense that the out-neighborhoods of its vertices are pairwise distinct. Equivalently, a set graph is the underlying graph of the digraph representation of a hereditarily finite set. In this paper, we continue the study of set graphs and related topics, focusing on computational complexity aspects. We prove that set graph recognition is NP-complete, even when the input is restricted to bipartite graphs with exactly two leaves. The problem remains NP-complete if, in addition, we require that the extensional acyclic orientation be also `slim', that is, that the digraph obtained by removing any arc from it is not extensional. We also show that the counting variants of the above problems are #P-complete, and prove similar complexity results for problems related to a generalization of extensional acyclic digraphs, the so-called `hyper-extensional digraphs', which were proposed by Aczel to describe hypersets. Our proofs are based on reductions from variants of the Hamiltonian Path problem. We also consider a variant of the well-known notion of a separating code in a digraph, the so-called `open-out-separating code', and show that it is NP-complete to determine whether an input extensional acyclic digraph contains an open-out-separating code of given size.
△ Less
Submitted 31 July, 2012;
originally announced July 2012.
-
On cycles through two arcs in strong multipartite tournaments
Authors:
Alexandru I. Tomescu
Abstract:
A multipartite tournament is an orientation of a complete $c$-partite graph. In [L. Volkmann, A remark on cycles through an arc in strongly connected multipartite tournaments, Appl. Math. Lett. 20 (2007) 1148--1150], Volkmann proved that a strongly connected $c$-partite tournament with $c \ge 3$ contains an arc that belongs to a directed cycle of length $m$ for every $m \in \{3, 4, \ldots, c\}$. H…
▽ More
A multipartite tournament is an orientation of a complete $c$-partite graph. In [L. Volkmann, A remark on cycles through an arc in strongly connected multipartite tournaments, Appl. Math. Lett. 20 (2007) 1148--1150], Volkmann proved that a strongly connected $c$-partite tournament with $c \ge 3$ contains an arc that belongs to a directed cycle of length $m$ for every $m \in \{3, 4, \ldots, c\}$. He also conjectured the existence of three arcs with this property. In this note, we prove the existence of two such arcs.
△ Less
Submitted 4 June, 2010;
originally announced June 2010.