-
Minimum Path Cover in Parameterized Linear Time
Authors:
Manuel Caceres,
Massimo Cairo,
Brendan Mumey,
Romeo Rizzi,
Alexandru I. Tomescu
Abstract:
A minimum path cover (MPC) of a directed acyclic graph (DAG) $G = (V,E)$ is a minimum-size set of paths that together cover all the vertices of the DAG. Computing an MPC is a basic polynomial problem, dating back to Dilworth's and Fulkerson's results in the 1950s. Since the size $k$ of an MPC (also known as the width) can be small in practical applications, research has also studied algorithms who…
▽ More
A minimum path cover (MPC) of a directed acyclic graph (DAG) $G = (V,E)$ is a minimum-size set of paths that together cover all the vertices of the DAG. Computing an MPC is a basic polynomial problem, dating back to Dilworth's and Fulkerson's results in the 1950s. Since the size $k$ of an MPC (also known as the width) can be small in practical applications, research has also studied algorithms whose running time is parameterized on $k$.
We obtain a new MPC parameterized algorithm for DAGs running in time $O(k^2|V| + |E|)$. Our algorithm is the first solving the problem in parameterized linear time. Additionally, we obtain an edge sparsification algorithm preserving the width of a DAG but reducing $|E|$ to less than $2|V|$. This algorithm runs in time $O(k^2|V|)$ and requires an MPC of a DAG as input, thus its total running time is the same as the running time of our MPC algorithm.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
Cut paths and their remainder structure, with applications
Authors:
Massimo Cairo,
Shahbaz Khan,
Romeo Rizzi,
Sebastian Schmidt,
Alexandru I. Tomescu,
Elia C. Zirondelli
Abstract:
In a strongly connected graph $G = (V,E)$, a cut arc (also called strong bridge) is an arc $e \in E$ whose removal makes the graph no longer strongly connected. Equivalently, there exist $u,v \in V$, such that all $u$-$v$ walks contain $e$. Cut arcs are a fundamental graph-theoretic notion, with countless applications, especially in reachability problems.
In this paper we initiate the study of c…
▽ More
In a strongly connected graph $G = (V,E)$, a cut arc (also called strong bridge) is an arc $e \in E$ whose removal makes the graph no longer strongly connected. Equivalently, there exist $u,v \in V$, such that all $u$-$v$ walks contain $e$. Cut arcs are a fundamental graph-theoretic notion, with countless applications, especially in reachability problems.
In this paper we initiate the study of cut paths, as a generalisation of cut arcs, which we naturally define as those paths $P$ for which there exist $u,v \in V$, such that all $u$-$v$ walks contain $P$ as subwalk. We first prove various properties of cut paths and define their remainder structures, which we use to present a simple $O(m)$-time verification algorithm for a cut path ($|V| = n$, $|E| = m$).
Secondly, we apply cut paths and their remainder structures to improve several reachability problems from bioinformatics. A walk is called safe if it is a subwalk of every node-covering closed walk of a strongly connected graph. Multi-safety is defined analogously, by considering node-covering sets of closed walks instead. We show that cut paths provide simple $O(m)$-time algorithms verifying if a walk is safe or multi-safe. For multi-safety, we present the first linear time algorithm, while for safety, we present a simple algorithm where the state-of-the-art employed complex data structures. Finally we show that the simultaneous computation of remainder structures of all subwalks of a cut path can be performed in linear time. These properties yield an $O(mn)$ algorithm outputting all maximal multi-safe walks, improving over the state-of-the-art algorithm running in time $O(m^2+n^3)$.
The results of this paper only scratch the surface in the study of cut paths, and we believe a rich structure of a graph can be revealed, considering the perspective of a path, instead of just an arc.
△ Less
Submitted 14 October, 2022;
originally announced October 2022.
-
Width Helps and Hinders Splitting Flows
Authors:
Manuel Cáceres,
Massimo Cairo,
Andreas Grigorjew,
Shahbaz Khan,
Brendan Mumey,
Romeo Rizzi,
Alexandru I. Tomescu,
Lucia Williams
Abstract:
Minimum flow decomposition (MFD) is the NP-hard problem of finding a smallest decomposition of a network flow/circulation $X$ on a directed graph $G$ into weighted source-to-sink paths whose superposition equals $X$. We show that, for acyclic graphs, considering the \emph{width} of the graph (the minimum number of paths needed to cover all of its edges) yields advances in our understanding of its…
▽ More
Minimum flow decomposition (MFD) is the NP-hard problem of finding a smallest decomposition of a network flow/circulation $X$ on a directed graph $G$ into weighted source-to-sink paths whose superposition equals $X$. We show that, for acyclic graphs, considering the \emph{width} of the graph (the minimum number of paths needed to cover all of its edges) yields advances in our understanding of its approximability. For the version of the problem that uses only non-negative weights, we identify and characterise a new class of \emph{width-stable} graphs, for which a popular heuristic is a \gwsimple-approximation ($|X|$ being the total flow of $X$), and strengthen its worst-case approximation ratio from $Ω(\sqrt{m})$ to $Ω(m / \log m)$ for sparse graphs, where $m$ is the number of edges in the graph. We also study a new problem on graphs with cycles, Minimum Cost Circulation Decomposition (MCCD), and show that it generalises MFD through a simple reduction. For the version allowing also negative weights, we give a $(\lceil \log \Vert X \Vert \rceil +1)$-approximation ($\Vert X \Vert$ being the maximum absolute value of $X$ on any edge) using a power-of-two approach, combined with parity fixing arguments and a decomposition of unitary circulations ($\Vert X \Vert \leq 1$), using a generalised notion of width for this problem. Finally, we disprove a conjecture about the linear independence of minimum (non-negative) flow decompositions posed by Kloster et al. [ALENEX 2018], but show that its useful implication (polynomial-time assignments of weights to a given set of paths to decompose a flow) holds for the negative version.
△ Less
Submitted 9 May, 2023; v1 submitted 5 July, 2022;
originally announced July 2022.
-
Sparsifying, Shrinking and Splicing for Minimum Path Cover in Parameterized Linear Time
Authors:
Manuel Cáceres,
Massimo Cairo,
Brendan Mumey,
Romeo Rizzi,
Alexandru I. Tomescu
Abstract:
A minimum path cover (MPC) of a directed acyclic graph (DAG) $G = (V,E)$ is a minimum-size set of paths that together cover all the vertices of the DAG. Computing an MPC is a basic polynomial problem, dating back to Dilworth's and Fulkerson's results in the 1950s. Since the size $k$ of an MPC (also known as the width) can be small in practical applications, research has also studied algorithms who…
▽ More
A minimum path cover (MPC) of a directed acyclic graph (DAG) $G = (V,E)$ is a minimum-size set of paths that together cover all the vertices of the DAG. Computing an MPC is a basic polynomial problem, dating back to Dilworth's and Fulkerson's results in the 1950s. Since the size $k$ of an MPC (also known as the width) can be small in practical applications, research has also studied algorithms whose complexity is parameterized on $k$. We obtain two new MPC parameterized algorithms for DAGs running in time $O(k^2|V|\log{|V|} + |E|)$ and $O(k^3|V| + |E|)$. We also obtain a parallel algorithm running in $O(k^2|V| + |E|)$ parallel steps and using $O(\log{|V|})$ processors (in the PRAM model). Our latter two algorithms are the first solving the problem in parameterized linear time. Finally, we present an algorithm running in time $O(k^2|V|)$ for transforming any MPC to another MPC using less than $2|V|$ distinct edges, which we prove to be asymptotically tight. As such, we also obtain edge sparsification algorithms preserving the width of the DAG with the same running time as our MPC algorithms. At the core of all our algorithms we interleave the usage of three techniques: transitive sparsification, shrinking of a path cover, and the splicing of a set of paths along a given path.
△ Less
Submitted 12 July, 2021;
originally announced July 2021.
-
Development of a dynamic type 2 diabetes risk prediction tool: a UK Biobank study
Authors:
Nikola Dolezalova,
Massimo Cairo,
Alex Despotovic,
Adam T. C. Booth,
Angus B. Reed,
Davide Morelli,
David Plans
Abstract:
Diabetes affects over 400 million people and is among the leading causes of morbidity worldwide. Identification of high-risk individuals can support early diagnosis and prevention of disease development through lifestyle changes. However, the majority of existing risk scores require information about blood-based factors which are not obtainable outside of the clinic. Here, we aimed to develop an a…
▽ More
Diabetes affects over 400 million people and is among the leading causes of morbidity worldwide. Identification of high-risk individuals can support early diagnosis and prevention of disease development through lifestyle changes. However, the majority of existing risk scores require information about blood-based factors which are not obtainable outside of the clinic. Here, we aimed to develop an accessible solution that could be deployed digitally and at scale. We developed a predictive 10-year type 2 diabetes risk score using 301 features derived from 472,830 participants in the UK Biobank dataset while excluding any features which are not easily obtainable by a smartphone. Using a data-driven feature selection process, 19 features were included in the final reduced model. A Cox proportional hazards model slightly overperformed a DeepSurv model trained using the same features, achieving a concordance index of 0.818 (95% CI: 0.812-0.823), compared to 0.811 (95% CI: 0.806-0.815). The final model showed good calibration. This tool can be used for clinical screening of individuals at risk of developing type 2 diabetes and to foster patient empowerment by broadening their knowledge of the factors affecting their personal risk.
△ Less
Submitted 20 April, 2021;
originally announced April 2021.
-
The Hydrostructure: a Universal Framework for Safe and Complete Algorithms for Genome Assembly
Authors:
Massimo Cairo,
Shahbaz Khan,
Romeo Rizzi,
Sebastian Schmidt,
Alexandru I. Tomescu,
Elia C. Zirondelli
Abstract:
Genome assembly is a fundamental problem in Bioinformatics, requiring to reconstruct a source genome from an assembly graph built from a set of reads (short strings sequenced from the genome). A notion of genome assembly solution is that of an arc-covering walk of the graph. Since assembly graphs admit many solutions, the goal is to find what is definitely present in all solutions, or what is safe…
▽ More
Genome assembly is a fundamental problem in Bioinformatics, requiring to reconstruct a source genome from an assembly graph built from a set of reads (short strings sequenced from the genome). A notion of genome assembly solution is that of an arc-covering walk of the graph. Since assembly graphs admit many solutions, the goal is to find what is definitely present in all solutions, or what is safe. Most practical assemblers are based on heuristics having at their core unitigs, namely paths whose internal nodes have unit in-degree and out-degree, and which are clearly safe. The long-standing open problem of finding all the safe parts of the solutions was recently solved [RECOMB 2016] yielding a 60% increase in contig length. This safe and complete genome assembly algorithm was followed by other works improving the time bounds, as well as extending the results for different notions of assembly solution. But it remained open whether one can be complete also for models of genome assembly of practical applicability.
In this paper we present a universal framework for obtaining safe and complete algorithms which unify the previous results, while also allowing for easy generalisations to assembly problems including many practical aspects. This is based on a novel graph structure, called the hydrostructure of a walk, which highlights the reachability properties of the graph from the perspective of the walk. The hydrostructure allows for simple characterisations of the existing safe walks, and of their new practical versions. Almost all of our characterisations are directly adaptable to optimal verification algorithms, and simple enumeration algorithms. Most of these algorithms are also improved to optimality using an incremental computation procedure and a previous optimal algorithm of a specific model.
△ Less
Submitted 2 November, 2021; v1 submitted 25 November, 2020;
originally announced November 2020.
-
A linear-time parameterized algorithm for computing the width of a DAG
Authors:
Manuel Cáceres,
Massimo Cairo,
Brendan Mumey,
Romeo Rizzi,
Alexandru I. Tomescu
Abstract:
The width $k$ of a directed acyclic graph (DAG) $G = (V, E)$ equals the largest number of pairwise non-reachable vertices. Computing the width dates back to Dilworth's and Fulkerson's results in the 1950s, and is doable in quadratic time in the worst case. Since $k$ can be small in practical applications, research has also studied algorithms whose complexity is parameterized on $k$. Despite these…
▽ More
The width $k$ of a directed acyclic graph (DAG) $G = (V, E)$ equals the largest number of pairwise non-reachable vertices. Computing the width dates back to Dilworth's and Fulkerson's results in the 1950s, and is doable in quadratic time in the worst case. Since $k$ can be small in practical applications, research has also studied algorithms whose complexity is parameterized on $k$. Despite these efforts, it is still open whether there exists a linear-time $O(f(k)(|V| + |E|))$ parameterized algorithm computing the width. We answer this question affirmatively by presenting an $O(k^24^k|V| + k2^k|E|)$ time algorithm, based on a new notion of frontier antichains. As we process the vertices in a topological order, all frontier antichains can be maintained with the help of several combinatorial properties, paying only $f(k)$ along the way. The fact that the width can be computed by a single $f(k)$-sweep of the DAG is a new surprising insight into this classical problem. Our algorithm also allows deciding whether the DAG has width at most $w$ in time $O(f(\min(w,k))(|V|+|E|))$.
△ Less
Submitted 24 June, 2021; v1 submitted 15 July, 2020;
originally announced July 2020.
-
Safety in $s$-$t$ Paths, Trails and Walks
Authors:
Massimo Cairo,
Shahbaz Khan,
Romeo Rizzi,
Sebastian Schmidt,
Alexandru I. Tomescu
Abstract:
Given a directed graph $G$ and a pair of nodes $s$ and $t$, an \emph{$s$-$t$ bridge} of $G$ is an edge whose removal breaks all $s$-$t$ paths of $G$ (and thus appears in all $s$-$t$ paths). Computing all $s$-$t$ bridges of $G$ is a basic graph problem, solvable in linear time.
In this paper, we consider a natural generalisation of this problem, with the notion of "safety" from bioinformatics. We…
▽ More
Given a directed graph $G$ and a pair of nodes $s$ and $t$, an \emph{$s$-$t$ bridge} of $G$ is an edge whose removal breaks all $s$-$t$ paths of $G$ (and thus appears in all $s$-$t$ paths). Computing all $s$-$t$ bridges of $G$ is a basic graph problem, solvable in linear time.
In this paper, we consider a natural generalisation of this problem, with the notion of "safety" from bioinformatics. We say that a walk $W$ is \emph{safe} with respect to a set $\mathcal{W}$ of $s$-$t$ walks, if $W$ is a subwalk of all walks in $\mathcal{W}$. We start by considering the maximal safe walks when $\mathcal{W}$ consists of: all $s$-$t$ paths, all $s$-$t$ trails, or all $s$-$t$ walks of $G$. We show that the first two problems are immediate linear-time generalisations of finding all $s$-$t$ bridges, while the third problem is more involved. In particular, we show that there exists a compact representation computable in linear time, that allows outputting all maximal safe walks in time linear in their length.
We further generalise these problems, by assuming that safety is defined only with respect to a subset of \emph{visible} edges. Here we prove a dichotomy between the $s$-$t$ paths and $s$-$t$ trails cases, and the $s$-$t$ walks case: the former two are NP-hard, while the latter is solvable with the same complexity as when all edges are visible. We also show that the same complexity results hold for the analogous generalisations of \emph{$s$-$t$ articulation points} (nodes appearing in all $s$-$t$ paths).
We thus obtain the best possible results for natural "safety"-generalisations of these two fundamental graph problems. Moreover, our algorithms are simple and do not employ any complex data structures, making them ideal for use in practice.
△ Less
Submitted 17 July, 2020; v1 submitted 9 July, 2020;
originally announced July 2020.
-
Computing all $s$-$t$ bridges and articulation points simplified
Authors:
Massimo Cairo,
Shahbaz Khan,
Romeo Rizzi,
Sebastian Schmidt,
Alexandru I. Tomescu,
Elia Zirondelli
Abstract:
Given a directed graph $G$ and a pair of nodes $s$ and $t$, an $s$-$t$ bridge of $G$ is an edge whose removal breaks all $s$-$t$ paths of $G$. Similarly, an $s$-$t$ articulation point of $G$ is a node whose removal breaks all $s$-$t$ paths of $G$. Computing the sequence of all $s$-$t$ bridges of $G$ (as well as the $s$-$t$ articulation points) is a basic graph problem, solvable in linear time usin…
▽ More
Given a directed graph $G$ and a pair of nodes $s$ and $t$, an $s$-$t$ bridge of $G$ is an edge whose removal breaks all $s$-$t$ paths of $G$. Similarly, an $s$-$t$ articulation point of $G$ is a node whose removal breaks all $s$-$t$ paths of $G$. Computing the sequence of all $s$-$t$ bridges of $G$ (as well as the $s$-$t$ articulation points) is a basic graph problem, solvable in linear time using the classical min-cut algorithm.
When dealing with cuts of unit size ($s$-$t$ bridges) this algorithm can be simplified to a single graph traversal from $s$ to $t$ avoiding an arbitrary $s$-$t$ path, which is interrupted at the $s$-$t$ bridges. Further, the corresponding proof is also simplified making it independent of the theory of network flows.
△ Less
Submitted 26 June, 2020;
originally announced June 2020.
-
Genome assembly, from practice to theory: safe, complete and linear-time
Authors:
Massimo Cairo,
Romeo Rizzi,
Alexandru I. Tomescu,
Elia C. Zirondelli
Abstract:
Genome assembly asks to reconstruct an unknown string from many shorter substrings of it. Even though it is one of the key problems in Bioinformatics, it is generally lacking major theoretical advances. Its hardness stems both from practical issues (size and errors of real data), and from the fact that problem formulations inherently admit multiple solutions. Given these, at their core, most state…
▽ More
Genome assembly asks to reconstruct an unknown string from many shorter substrings of it. Even though it is one of the key problems in Bioinformatics, it is generally lacking major theoretical advances. Its hardness stems both from practical issues (size and errors of real data), and from the fact that problem formulations inherently admit multiple solutions. Given these, at their core, most state-of-the-art assemblers are based on finding non-branching paths (unitigs) in an assembly graph. If one defines a genome assembly solution as a closed arc-covering walk of the graph, then unitigs appear in all solutions, being thus safe partial solutions. All all such safe walks were recently characterized as omnitigs, leading to the first safe and complete genome assembly algorithm. Even if omnitig finding was improved to quadratic time, it remained open whether the crucial linear-time feature of finding unitigs can be attained with omnitigs.
We describe a surprising $O(m)$-time algorithm to identify all maximal omnitigs of a graph with $n$ nodes and $m$ arcs, notwithstanding the existence of families of graphs with $Θ(mn)$ total maximal omnitig size. This is based on the discovery of a family of walks (macrotigs) with the property that all the non-trivial omnitigs are univocal extensions of subwalks of a macrotig, with two consequences: (1) A linear-time output-sensitive algorithm enumerating all maximal omnitigs. (2) A compact $O(m)$ representation of all maximal omnitigs, which allows, e.g., for $O(m)$-time computation of various statistics on them.
Our results close a long-standing theoretical question inspired by practical genome assemblers, originating with the use of unitigs in 1995. We envision our results to be at the core of a reverse transfer from theory to practical and complete genome assembly programs, as has been the case for other key Bioinformatics problems.
△ Less
Submitted 8 November, 2020; v1 submitted 24 February, 2020;
originally announced February 2020.
-
Hardness of Covering Alignment: Phase Transition in Post-Sequence Genomics
Authors:
Romeo Rizzi,
Massimo Cairo,
Veli Mäkinen,
Alexandru I. Tomescu,
Daniel Valenzuela
Abstract:
Covering alignment problems arise from recent developments in genomics; so called pan-genome graphs are replacing reference genomes, and advances in haplotyping enable full content of diploid genomes to be used as basis of sequence analysis. In this paper, we show that the computational complexity will change for natural extensions of alignments to pan-genome representations and to diploid genomes…
▽ More
Covering alignment problems arise from recent developments in genomics; so called pan-genome graphs are replacing reference genomes, and advances in haplotyping enable full content of diploid genomes to be used as basis of sequence analysis. In this paper, we show that the computational complexity will change for natural extensions of alignments to pan-genome representations and to diploid genomes. More broadly, our approach can also be seen as a minimal extension of sequence alignment to labelled directed acyclic graphs (labeled DAGs). Namely, we show that finding a \emph{covering alignment} of two labeled DAGs is NP-hard even on binary alphabets. A covering alignment asks for two paths $R_1$ (red) and $G_1$ (green) in DAG $D_1$ and two paths $R_2$ (red) and $G_2$ (green) in DAG $D_2$ that cover the nodes of the graphs and maximize the sum of the global alignment scores: $\mathsf{as}(\mathsf{sp}(R_1),\mathsf{sp}(R_2))+\mathsf{as}(\mathsf{sp}(G_1),\mathsf{sp}(G_2))$, where $\mathsf{sp}(P)$ is the concatenation of labels on the path $P$. Pair-wise alignment of haplotype sequences forming a diploid chromosome can be converted to a two-path coverable labelled DAG, and then the covering alignment models the similarity of two diploids over arbitrary recombinations. We also give a reduction to the other direction, to show that such a recombination-oblivious diploid alignment is NP-hard on alphabets of size $3$.
△ Less
Submitted 22 May, 2018; v1 submitted 15 November, 2016;
originally announced November 2016.
-
Dynamic Controllability of Conditional Simple Temporal Networks is PSPACE-complete
Authors:
Massimo Cairo,
Romeo Rizzi
Abstract:
Even after the proposal of various solution algorithms, the precise computational complexity of checking whether a Conditional Temporal Network is Dynamically Controllable had still remained widely open. This issue gets settled in this paper which provides constructions, algorithms, and bridging lemmas and arguments to formally prove that: (1) the problem is PSPACE-hard, and (2) the problem lies i…
▽ More
Even after the proposal of various solution algorithms, the precise computational complexity of checking whether a Conditional Temporal Network is Dynamically Controllable had still remained widely open. This issue gets settled in this paper which provides constructions, algorithms, and bridging lemmas and arguments to formally prove that: (1) the problem is PSPACE-hard, and (2) the problem lies in PSPACE.
△ Less
Submitted 30 August, 2016;
originally announced August 2016.
-
Instantaneous Reaction-Time in Dynamic-Consistency Checking of Conditional Simple Temporal Networks -- Extended version with an Improved Upper Bound --
Authors:
Massimo Cairo,
Carlo Comin,
Romeo Rizzi
Abstract:
CSTNs is a constraint-based graph-formalism for conditional temporal planning. In order to address the DC-Checking problem, in [Comin and Rizzi, TIME 2015] we introduced epsilon-DC (a refined, more realistic, notion of DC), and provided an algorithmic solution to it. The epsilon-DC notion is interesting per se, and the epsilon-DC-Checking algorithm in [Comin and Rizzi, TIME 2015] rests on the assu…
▽ More
CSTNs is a constraint-based graph-formalism for conditional temporal planning. In order to address the DC-Checking problem, in [Comin and Rizzi, TIME 2015] we introduced epsilon-DC (a refined, more realistic, notion of DC), and provided an algorithmic solution to it. The epsilon-DC notion is interesting per se, and the epsilon-DC-Checking algorithm in [Comin and Rizzi, TIME 2015] rests on the assumption that the reaction-time satisfies epsilon > 0; leaving unsolved the question of what happens when epsilon = 0. In this work, we introduce and study pi-DC, a sound notion of DC with an instantaneous reaction-time (i.e. one in which the planner can react to any observation at the same instant of time in which the observation is made). Firstly, we demonstrate by a counter-example that pi-DC is not equivalent to 0-DC, and that 0-DC is actually inadequate for modeling DC with an instantaneous reaction-time. This shows that the main results obtained in our previous work do not apply directly, as they were formulated, to the case of epsilon=0. Motivated by this observation, as a second contribution, our previous tools are extended in order to handle pi-DC, and the notion of ps-tree is introduced, also pointing out a relationship between pi-DC and HyTN-Consistency. Thirdly, a simple reduction from pi-DC-Checking to DC-Checking is identified. This allows us to design and to analyze the first sound-and-complete pi-DC-Checking procedure. Remarkably, the time complexity of the proposed algorithm remains (pseudo) singly-exponential in the number of propositional letters. Finally, it is observed that the technique can be leveraged to actually reduce from pi-DC to 1-DC, this allows us to further improve the exponents in the time complexity of pi-DC-Checking.
△ Less
Submitted 9 December, 2018; v1 submitted 14 August, 2016;
originally announced August 2016.
-
The Complexity of Simulation and Matrix Multiplication
Authors:
Massimo Cairo,
Romeo Rizzi
Abstract:
Computing the simulation preorder of a given Kripke structure (i.e., a directed graph with $n$ labeled vertices) has crucial applications in model checking of temporal logic. It amounts to solving a specific two-players reachability game, called simulation game. We offer the first conditional lower bounds for this problem, and we relate its complexity (for computation, verification, and certificat…
▽ More
Computing the simulation preorder of a given Kripke structure (i.e., a directed graph with $n$ labeled vertices) has crucial applications in model checking of temporal logic. It amounts to solving a specific two-players reachability game, called simulation game. We offer the first conditional lower bounds for this problem, and we relate its complexity (for computation, verification, and certification) to some variants of $n\times n$ matrix multiplication.
We show that any $O(n^α)$-time algorithm for simulation games, even restricting to acyclic games/structures, can be used to compute $n\times n$ boolean matrix multiplication (BMM) in $O(n^α)$ time. This is the first evidence that improving the existing $O(n^{3})$-time solutions may be difficult, without resorting to fast matrix multiplication. In the acyclic case, we match this lower bound presenting the first subcubic algorithm, based on fast BMM, and running in $n^{ω+o(1)}$ time (where $ω<2.376$ is the exponent of matrix multiplication).
For both acyclic and cyclic structures, we point out the existence of natural and canonical $O(n^{2})$-size certificates, that can be verified in truly subcubic time. In the acyclic case, $O(n^{2})$ time is sufficient, employing standard matrix product verification. In the cyclic case, a $\max$-semi-boolean matrix multiplication (MSBMM) is used, i.e., a matrix multiplication on the semi-ring $(\max,\times)$ where one matrix contains only $0$'s and $1$'s. This MSBMM is computable (hence verifiable) in truly subcubic $n^{(3+ω)/2+o(1)}$ time by reduction to $(\max,\min)$-multiplication.
Finally, we show a reduction from MSBMM to cyclic simulation games which implies a separation between the cyclic and the acyclic cases, unless MSBMM can be verified in $n^{ω+o(1)}$ time.
△ Less
Submitted 30 August, 2016; v1 submitted 7 May, 2016;
originally announced May 2016.
-
Decoding Hidden Markov Models Faster Than Viterbi Via Online Matrix-Vector (max, +)-Multiplication
Authors:
Massimo Cairo,
Gabriele Farina,
Romeo Rizzi
Abstract:
In this paper, we present a novel algorithm for the maximum a posteriori decoding (MAPD) of time-homogeneous Hidden Markov Models (HMM), improving the worst-case running time of the classical Viterbi algorithm by a logarithmic factor. In our approach, we interpret the Viterbi algorithm as a repeated computation of matrix-vector $(\max, +)$-multiplications. On time-homogeneous HMMs, this computatio…
▽ More
In this paper, we present a novel algorithm for the maximum a posteriori decoding (MAPD) of time-homogeneous Hidden Markov Models (HMM), improving the worst-case running time of the classical Viterbi algorithm by a logarithmic factor. In our approach, we interpret the Viterbi algorithm as a repeated computation of matrix-vector $(\max, +)$-multiplications. On time-homogeneous HMMs, this computation is online: a matrix, known in advance, has to be multiplied with several vectors revealed one at a time. Our main contribution is an algorithm solving this version of matrix-vector $(\max,+)$-multiplication in subquadratic time, by performing a polynomial preprocessing of the matrix. Employing this fast multiplication algorithm, we solve the MAPD problem in $O(mn^2/ \log n)$ time for any time-homogeneous HMM of size $n$ and observation sequence of length $m$, with an extra polynomial preprocessing cost negligible for $m > n$. To the best of our knowledge, this is the first algorithm for the MAPD problem requiring subquadratic time per observation, under the only assumption -- usually verified in practice -- that the transition probability matrix does not change with time.
△ Less
Submitted 11 December, 2015; v1 submitted 30 November, 2015;
originally announced December 2015.