-
On the Space Usage of Approximate Distance Oracles with Sub-2 Stretch
Authors:
Tsvi Kopelowitz,
Ariel Korin,
Liam Roditty
Abstract:
For an undirected unweighted graph G = (V, E) with n vertices and m edges, let d(u, v) denote the distance from u in V to v in V in G. An (alpha, beta)-stretch approximate distance oracle (ADO) for G is a data structure that, given u, v in V, returns in constant time a value d-hat (u, v) such that d(u, v) <= d-hat (u, v) <= alpha * d(u, v) + beta, for some reals alpha > 1, beta. If beta = 0, we sa…
▽ More
For an undirected unweighted graph G = (V, E) with n vertices and m edges, let d(u, v) denote the distance from u in V to v in V in G. An (alpha, beta)-stretch approximate distance oracle (ADO) for G is a data structure that, given u, v in V, returns in constant time a value d-hat (u, v) such that d(u, v) <= d-hat (u, v) <= alpha * d(u, v) + beta, for some reals alpha > 1, beta. If beta = 0, we say that the ADO has stretch alpha.
Thorup and Zwick (2005) showed that one cannot beat stretch 3 with subquadratic space (in terms of n) for general graphs. Patrascu and Roditty (2010) showed that one can obtain stretch 2 using O(m^(1/3)n^(4/3)) space, and so if m is subquadratic in n, then the space usage is also subquadratic. Moreover, Patrascu and Roditty (2010) showed that one cannot beat stretch 2 with subquadratic space even for graphs where m = O-tilde(n), based on the set-intersection hypothesis.
In this paper, we investigate the minimum possible stretch achievable by an ADO as a function of the graph's maximum degree, a study motivated by the question of identifying the conditions under which an ADO can be stored with subquadratic space while still ensuring a sub-2 stretch. In particular, we show that if the maximum degree in G is Delta_G <= O(n^(1/k - epsilon)) for some 0 < epsilon <= 1/k, then there exists a (2, 1 - k)-stretch ADO for G that uses O-tilde(n^(2 - (k * epsilon) / 3)) space. For k = 2, this result implies a subquadratic sub-2 stretch ADO for graphs with Delta_G <= O(n^(1/2 - epsilon)). We provide tight lower bounds for the upper bound under the same set intersection hypothesis, showing that if Delta_G = Theta(n^(1/k)), a (2, 1 - k)-stretch ADO requires Omega-tilde(n^2) space. Moreover, we show that for constants epsilon, c > 0, a (2 - epsilon, c)-stretch ADO requires Omega-tilde(n^2) space even for graphs with Delta_G = Theta-tilde(1).
△ Less
Submitted 1 October, 2024; v1 submitted 18 October, 2023;
originally announced October 2023.
-
An Improved Algorithm for The $k$-Dyck Edit Distance Problem
Authors:
Dvir Fried,
Shay Golan,
Tomasz Kociumaka,
Tsvi Kopelowitz,
Ely Porat,
Tatiana Starikovskaya
Abstract:
A Dyck sequence is a sequence of opening and closing parentheses (of various types) that is balanced. The Dyck edit distance of a given sequence of parentheses $S$ is the smallest number of edit operations (insertions, deletions, and substitutions) needed to transform $S$ into a Dyck sequence. We consider the threshold Dyck edit distance problem, where the input is a sequence of parentheses $S$ an…
▽ More
A Dyck sequence is a sequence of opening and closing parentheses (of various types) that is balanced. The Dyck edit distance of a given sequence of parentheses $S$ is the smallest number of edit operations (insertions, deletions, and substitutions) needed to transform $S$ into a Dyck sequence. We consider the threshold Dyck edit distance problem, where the input is a sequence of parentheses $S$ and a positive integer $k$, and the goal is to compute the Dyck edit distance of $S$ only if the distance is at most $k$, and otherwise report that the distance is larger than $k$. Backurs and Onak [PODS'16] showed that the threshold Dyck edit distance problem can be solved in $O(n+k^{16})$ time.
In this work, we design new algorithms for the threshold Dyck edit distance problem which costs $O(n+k^{4.544184})$ time with high probability or $O(n+k^{4.853059})$ deterministically. Our algorithms combine several new structural properties of the Dyck edit distance problem, a refined algorithm for fast $(\min,+)$ matrix product, and a careful modification of ideas used in Valiant's parsing algorithm.
△ Less
Submitted 22 August, 2022; v1 submitted 3 November, 2021;
originally announced November 2021.
-
Incremental Edge Orientation in Forests
Authors:
Michael A. Bender,
Tsvi Kopelowitz,
William Kuszmaul,
Ely Porat,
Clifford Stein
Abstract:
For any forest $G = (V, E)$ it is possible to orient the edges $E$ so that no vertex in $V$ has out-degree greater than $1$. This paper considers the incremental edge-orientation problem, in which the edges $E$ arrive over time and the algorithm must maintain a low-out-degree edge orientation at all times. We give an algorithm that maintains a maximum out-degree of $3$ while flipping at most…
▽ More
For any forest $G = (V, E)$ it is possible to orient the edges $E$ so that no vertex in $V$ has out-degree greater than $1$. This paper considers the incremental edge-orientation problem, in which the edges $E$ arrive over time and the algorithm must maintain a low-out-degree edge orientation at all times. We give an algorithm that maintains a maximum out-degree of $3$ while flipping at most $O(\log \log n)$ edge orientations per edge insertion, with high probability in $n$. The algorithm requires worst-case time $O(\log n \log \log n)$ per insertion, and takes amortized time $O(1)$. The previous state of the art required up to $O(\log n / \log \log n)$ edge flips per insertion.
We then apply our edge-orientation results to the problem of dynamic Cuckoo hashing. The problem of designing simple families $\mathcal{H}$ of hash functions that are compatible with Cuckoo hashing has received extensive attention. These families $\mathcal{H}$ are known to satisfy \emph{static guarantees}, but do not come typically with \emph{dynamic guarantees} for the running time of inserts and deletes. We show how to transform static guarantees (for $1$-associativity) into near-state-of-the-art dynamic guarantees (for $O(1)$-associativity) in a black-box fashion. Rather than relying on the family $\mathcal{H}$ to supply randomness, as in past work, we instead rely on randomness within our table-maintenance algorithm.
△ Less
Submitted 5 July, 2021;
originally announced July 2021.
-
Support Optimality and Adaptive Cuckoo Filters
Authors:
Tsvi Kopelowitz,
Samuel McCauley,
Ely Porat
Abstract:
Filters (such as Bloom Filters) are data structures that speed up network routing and measurement operations by storing a compressed representation of a set. Filters are space efficient, but can make bounded one-sided errors: with tunable probability epsilon, they may report that a query element is stored in the filter when it is not. This is called a false positive. Recent research has focused on…
▽ More
Filters (such as Bloom Filters) are data structures that speed up network routing and measurement operations by storing a compressed representation of a set. Filters are space efficient, but can make bounded one-sided errors: with tunable probability epsilon, they may report that a query element is stored in the filter when it is not. This is called a false positive. Recent research has focused on designing methods for dynamically adapting filters to false positives, reducing the number of false positives when some elements are queried repeatedly.
Ideally, an adaptive filter would incur a false positive with bounded probability epsilon for each new query element, and would incur o(epsilon) total false positives over all repeated queries to that element. We call such a filter support optimal.
In this paper we design a new Adaptive Cuckoo Filter and show that it is support optimal (up to additive logarithmic terms) over any n queries when storing a set of size n. Our filter is simple: fixing previous false positives requires a simple cuckoo operation, and the filter does not need to store any additional metadata. This data structure is the first practical data structure that is support optimal, and the first filter that does not require additional space to fix false positives.
We complement these bounds with experiments showing that our data structure is effective at fixing false positives on network traces, outperforming previous Adaptive Cuckoo Filters.
Finally, we investigate adversarial adaptivity, a stronger notion of adaptivity in which an adaptive adversary repeatedly queries the filter, using the result of previous queries to drive the false positive rate as high as possible. We prove a lower bound showing that a broad family of filters, including all known Adaptive Cuckoo Filters, can be forced by such an adversary to incur a large number of false positives.
△ Less
Submitted 21 May, 2021;
originally announced May 2021.
-
An $O(\log^{3/2}n)$ Parallel Time Population Protocol for Majority with $O(\log n)$ States
Authors:
Stav Ben-Nun,
Tsvi Kopelowitz,
Matan Kraus,
Ely Porat
Abstract:
In population protocols, the underlying distributed network consists of $n$ nodes (or agents), denoted by $V$, and a scheduler that continuously selects uniformly random pairs of nodes to interact. When two nodes interact, their states are updated by applying a state transition function that depends only on the states of the two nodes prior to the interaction. The efficiency of a population protoc…
▽ More
In population protocols, the underlying distributed network consists of $n$ nodes (or agents), denoted by $V$, and a scheduler that continuously selects uniformly random pairs of nodes to interact. When two nodes interact, their states are updated by applying a state transition function that depends only on the states of the two nodes prior to the interaction. The efficiency of a population protocol is measured in terms of both time (which is the number of interactions until the nodes collectively have a valid output) and the number of possible states of nodes used by the protocol. By convention, we consider the parallel time cost, which is the time divided by $n$.
In this paper we consider the majority problem, where each node receives as input a color that is either black or white, and the goal is to have all of the nodes output the color that is the majority of the input colors. We design a population protocol that solves the majority problem in $O(\log^{3/2}n)$ parallel time, both with high probability and in expectation, while using $O(\log n)$ states. Our protocol improves on a recent protocol of Berenbrink et al. that runs in $O(\log^{5/3}n)$ parallel time, both with high probability and in expectation, using $O(\log n)$ states.
△ Less
Submitted 25 November, 2020;
originally announced November 2020.
-
Improved Circular $k$-Mismatch Sketches
Authors:
Shay Golan,
Tomasz Kociumaka,
Tsvi Kopelowitz,
Ely Porat,
Przemysław Uznański
Abstract:
The shift distance $\mathsf{sh}(S_1,S_2)$ between two strings $S_1$ and $S_2$ of the same length is defined as the minimum Hamming distance between $S_1$ and any rotation (cyclic shift) of $S_2$. We study the problem of sketching the shift distance, which is the following communication complexity problem: Strings $S_1$ and $S_2$ of length $n$ are given to two identical players (encoders), who inde…
▽ More
The shift distance $\mathsf{sh}(S_1,S_2)$ between two strings $S_1$ and $S_2$ of the same length is defined as the minimum Hamming distance between $S_1$ and any rotation (cyclic shift) of $S_2$. We study the problem of sketching the shift distance, which is the following communication complexity problem: Strings $S_1$ and $S_2$ of length $n$ are given to two identical players (encoders), who independently compute sketches (summaries) $\mathtt{sk}(S_1)$ and $\mathtt{sk}(S_2)$, respectively, so that upon receiving the two sketches, a third player (decoder) is able to compute (or approximate) $\mathsf{sh}(S_1,S_2)$ with high probability.
This paper primarily focuses on the more general $k$-mismatch version of the problem, where the decoder is allowed to declare a failure if $\mathsf{sh}(S_1,S_2)>k$, where $k$ is a parameter known to all parties. Andoni et al. (STOC'13) introduced exact circular $k$-mismatch sketches of size $\widetilde{O}(k+D(n))$, where $D(n)$ is the number of divisors of $n$. Andoni et al. also showed that their sketch size is optimal in the class of linear homomorphic sketches.
We circumvent this lower bound by designing a (non-linear) exact circular $k$-mismatch sketch of size $\widetilde{O}(k)$; this size matches communication-complexity lower bounds. We also design $(1\pm \varepsilon)$-approximate circular $k$-mismatch sketch of size $\widetilde{O}(\min(\varepsilon^{-2}\sqrt{k}, \varepsilon^{-1.5}\sqrt{n}))$, which improves upon an $\widetilde{O}(\varepsilon^{-2}\sqrt{n})$-size sketch of Crouch and McGregor (APPROX'11).
△ Less
Submitted 24 June, 2020;
originally announced June 2020.
-
The Streaming k-Mismatch Problem: Tradeoffs between Space and Total Time
Authors:
Shay Golan,
Tomasz Kociumaka,
Tsvi Kopelowitz,
Ely Porat
Abstract:
We revisit the $k$-mismatch problem in the streaming model on a pattern of length $m$ and a streaming text of length $n$, both over a size-$σ$ alphabet. The current state-of-the-art algorithm for the streaming $k$-mismatch problem, by Clifford et al. [SODA 2019], uses $\tilde O(k)$ space and $\tilde O\big(\sqrt k\big)$ worst-case time per character. The space complexity is known to be (uncondition…
▽ More
We revisit the $k$-mismatch problem in the streaming model on a pattern of length $m$ and a streaming text of length $n$, both over a size-$σ$ alphabet. The current state-of-the-art algorithm for the streaming $k$-mismatch problem, by Clifford et al. [SODA 2019], uses $\tilde O(k)$ space and $\tilde O\big(\sqrt k\big)$ worst-case time per character. The space complexity is known to be (unconditionally) optimal, and the worst-case time per character matches a conditional lower bound. However, there is a gap between the total time cost of the algorithm, which is $\tilde O(n\sqrt k)$, and the fastest known offline algorithm, which costs $\tilde O\big(n + \min\big(\frac{nk}{\sqrt m},σn\big)\big)$ time. Moreover, it is not known whether improvements over the $\tilde O(n\sqrt k)$ total time are possible when using more than $O(k)$ space.
We address these gaps by designing a randomized streaming algorithm for the $k$-mismatch problem that, given an integer parameter $k\le s \le m$, uses $\tilde O(s)$ space and costs $\tilde O\big(n+\min\big(\frac {nk^2}m,\frac{nk}{\sqrt s},\frac{σnm}s\big)\big)$ total time. For $s=m$, the total runtime becomes $\tilde O\big(n + \min\big(\frac{nk}{\sqrt m},σn\big)\big)$, which matches the time cost of the fastest offline algorithm. Moreover, the worst-case time cost per character is still $\tilde O\big(\sqrt k\big)$.
△ Less
Submitted 27 April, 2020;
originally announced April 2020.
-
Contention Resolution Without Collision Detection
Authors:
Michael A. Bender,
Tsvi Kopelowitz,
William Kuszmaul,
Seth Pettie
Abstract:
This paper focuses on the contention resolution problem on a shared communication channel that does not support collision detection. A shared communication channel is a multiple access channel, which consists of a sequence of synchronized time slots. Players on the channel may attempt to broadcast a packet (message) in any time slot. A player's broadcast succeeds if no other player broadcasts duri…
▽ More
This paper focuses on the contention resolution problem on a shared communication channel that does not support collision detection. A shared communication channel is a multiple access channel, which consists of a sequence of synchronized time slots. Players on the channel may attempt to broadcast a packet (message) in any time slot. A player's broadcast succeeds if no other player broadcasts during that slot. If two or more players broadcast in the same time slot, then the broadcasts collide and both broadcasts fail. The lack of collision detection means that a player monitoring the channel cannot differentiate between the case of two or more players broadcasting in the same slot (a collision) and zero players broadcasting. In the contention-resolution problem, players arrive on the channel over time, and each player has one packet to transmit. The goal is to coordinate the players so that each player is able to successfully transmit its packet within reasonable time. However, the players can only communicate via the shared channel by choosing to either broadcast or not. A contention-resolution protocol is measured in terms of its throughput (channel utilization). Previous work on contention resolution that achieved constant throughput assumed that either players could detect collisions, or the players' arrival pattern is generated by a memoryless (non-adversarial) process. The foundational question answered by this paper is whether collision detection is a luxury or necessity when the objective is to achieve constant throughput. We show that even without collision detection, one can solve contention resolution, achieving constant throughput, with high probability.
△ Less
Submitted 4 May, 2020; v1 submitted 16 April, 2020;
originally announced April 2020.
-
Approximating Text-to-Pattern Hamming Distances
Authors:
Timothy M. Chan,
Shay Golan,
Tomasz Kociumaka,
Tsvi Kopelowitz,
Ely Porat
Abstract:
We revisit a fundamental problem in string matching: given a pattern of length m and a text of length n, both over an alphabet of size $σ$, compute the Hamming distance between the pattern and the text at every location. Several $(1+ε)$-approximation algorithms have been proposed in the literature, with running time of the form $O(ε^{-O(1)}n\log n\log m)$, all using fast Fourier transform (FFT). W…
▽ More
We revisit a fundamental problem in string matching: given a pattern of length m and a text of length n, both over an alphabet of size $σ$, compute the Hamming distance between the pattern and the text at every location. Several $(1+ε)$-approximation algorithms have been proposed in the literature, with running time of the form $O(ε^{-O(1)}n\log n\log m)$, all using fast Fourier transform (FFT). We describe a simple $(1+ε)$-approximation algorithm that is faster and does not need FFT. Combining our approach with additional ideas leads to numerous new results:
- We obtain the first linear-time approximation algorithm; the running time is $O(ε^{-2}n)$.
- We obtain a faster exact algorithm computing all Hamming distances up to a given threshold k; its running time improves previous results by logarithmic factors and is linear if $k\le\sqrt m$.
- We obtain approximation algorithms with better $ε$-dependence using rectangular matrix multiplication. The time-bound is $Õ(n)$ when the pattern is sufficiently long: $m\ge ε^{-28}$. Previous algorithms require $Õ(ε^{-1}n)$ time.
- When k is not too small, we obtain a truly sublinear-time algorithm to find all locations with Hamming distance approximately (up to a constant factor) less than k, in $O((n/k^{Ω(1)}+occ)n^{o(1)})$ time, where occ is the output size. The algorithm leads to a property tester, returning true if an exact match exists and false if the Hamming distance is more than $δm$ at every location, running in $Õ(δ^{-1/3}n^{2/3}+δ^{-1}n/m)$ time.
- We obtain a streaming algorithm to report all locations with Hamming distance approximately less than k, using $Õ(ε^{-2}\sqrt k)$ space. Previously, streaming algorithms were known for the exact problem with Õ(k) space or for the approximate problem with $Õ(ε^{-O(1)}\sqrt m)$ space.
△ Less
Submitted 1 January, 2020;
originally announced January 2020.
-
$\{-1,0,1\}$-APSP and (min,max)-Product Problems
Authors:
Hodaya Barr,
Tsvi Kopelowitz,
Ely Porat,
Liam Roditty
Abstract:
In the $\{-1,0,1\}$-APSP problem the goal is to compute all-pairs shortest paths (APSP) on a directed graph whose edge weights are all from $\{-1,0,1\}$. In the (min,max)-product problem the input is two $n\times n$ matrices $A$ and $B$, and the goal is to output the (min,max)-product of $A$ and $B$.
This paper provides a new algorithm for the $\{-1,0,1\}$-APSP problem via a simple reduction to…
▽ More
In the $\{-1,0,1\}$-APSP problem the goal is to compute all-pairs shortest paths (APSP) on a directed graph whose edge weights are all from $\{-1,0,1\}$. In the (min,max)-product problem the input is two $n\times n$ matrices $A$ and $B$, and the goal is to output the (min,max)-product of $A$ and $B$.
This paper provides a new algorithm for the $\{-1,0,1\}$-APSP problem via a simple reduction to the target-(min,max)-product problem where the input is three $n\times n$ matrices $A,B$, and $T$, and the goal is to output a Boolean $n\times n$ matrix $C$ such that the $(i,j)$ entry of $C$ is 1 if and only if the $(i,j)$ entry of the (min,max)-product of $A$ and $B$ is exactly the $(i,j)$ entry of the target matrix $T$. If (min,max)-product can be solved in $T_{MM}(n) = Ω(n^2)$ time then it is straightforward to solve target-(min,max)-product in $O(T_{MM}(n))$ time. Thus, given the recent result of Bringmann, Künnemann, and Wegrzycki [STOC 2019], the $\{-1,0,1\}$-APSP problem can be solved in the same time needed for solving approximate APSP on graphs with positive weights.
Moreover, we design a simple algorithm for target-(min,max)-product when the inputs are restricted to the family of inputs generated by our reduction. Using fast rectangular matrix multiplication, the new algorithm is faster than the current best known algorithm for (min,max)-product.
△ Less
Submitted 14 November, 2019;
originally announced November 2019.
-
The Strong 3SUM-INDEXING Conjecture is False
Authors:
Tsvi Kopelowitz,
Ely Porat
Abstract:
In the 3SUM-Indexing problem the goal is to preprocess two lists of elements from $U$, $A=(a_1,a_2,\ldots,a_n)$ and $B=(b_1,b_2,...,b_n)$, such that given an element $c\in U$ one can quickly determine whether there exists a pair $(a,b)\in A \times B$ where $a+b=c$. Goldstein et al.~[WADS'2017] conjectured that there is no algorithm for 3SUM-Indexing which uses $n^{2-Ω(1)}$ space and $n^{1-Ω(1)}$ q…
▽ More
In the 3SUM-Indexing problem the goal is to preprocess two lists of elements from $U$, $A=(a_1,a_2,\ldots,a_n)$ and $B=(b_1,b_2,...,b_n)$, such that given an element $c\in U$ one can quickly determine whether there exists a pair $(a,b)\in A \times B$ where $a+b=c$. Goldstein et al.~[WADS'2017] conjectured that there is no algorithm for 3SUM-Indexing which uses $n^{2-Ω(1)}$ space and $n^{1-Ω(1)}$ query time.
We show that the conjecture is false by reducing the 3SUM-Indexing problem to the problem of inverting functions, and then applying an algorithm of Fiat and Naor [SICOMP'1999] for inverting functions.
△ Less
Submitted 25 July, 2019;
originally announced July 2019.
-
Improved Worst-Case Deterministic Parallel Dynamic Minimum Spanning Forest
Authors:
Tsvi Kopelowitz,
Ely Porat,
Yair Rosenmutter
Abstract:
This paper gives a new deterministic algorithm for the dynamic Minimum Spanning Forest (MSF) problem in the EREW PRAM model, where the goal is to maintain a MSF of a weighted graph with $n$ vertices and $m$ edges while supporting edge insertions and deletions. We show that one can solve the dynamic MSF problem using $O(\sqrt n)$ processors and $O(\log n)$ worst-case update time, for a total of…
▽ More
This paper gives a new deterministic algorithm for the dynamic Minimum Spanning Forest (MSF) problem in the EREW PRAM model, where the goal is to maintain a MSF of a weighted graph with $n$ vertices and $m$ edges while supporting edge insertions and deletions. We show that one can solve the dynamic MSF problem using $O(\sqrt n)$ processors and $O(\log n)$ worst-case update time, for a total of $O(\sqrt n \log n)$ work. This improves on the work of Ferragina [IPPS 1995] which costs $O(\log n)$ worst-case update time and $O(n^{2/3} \log{\frac{m}{n}})$ work.
△ Less
Submitted 16 May, 2018;
originally announced May 2018.
-
Conditional Lower Bounds for Space/Time Tradeoffs
Authors:
Isaac Goldstein,
Tsvi Kopelowitz,
Moshe Lewenstein,
Ely Porat
Abstract:
In recent years much effort has been concentrated towards achieving polynomial time lower bounds on algorithms for solving various well-known problems. A useful technique for showing such lower bounds is to prove them conditionally based on well-studied hardness assumptions such as 3SUM, APSP, SETH, etc. This line of research helps to obtain a better understanding of the complexity inside P.
A r…
▽ More
In recent years much effort has been concentrated towards achieving polynomial time lower bounds on algorithms for solving various well-known problems. A useful technique for showing such lower bounds is to prove them conditionally based on well-studied hardness assumptions such as 3SUM, APSP, SETH, etc. This line of research helps to obtain a better understanding of the complexity inside P.
A related question asks to prove conditional space lower bounds on data structures that are constructed to solve certain algorithmic tasks after an initial preprocessing stage. This question received little attention in previous research even though it has potential strong impact.
In this paper we address this question and show that surprisingly many of the well-studied hard problems that are known to have conditional polynomial time lower bounds are also hard when concerning space. This hardness is shown as a tradeoff between the space consumed by the data structure and the time needed to answer queries. The tradeoff may be either smooth or admit one or more singularity points.
We reveal interesting connections between different space hardness conjectures and present matching upper bounds. We also apply these hardness conjectures to both static and dynamic problems and prove their conditional space hardness.
We believe that this novel framework of polynomial space conjectures can play an important role in expressing polynomial space lower bounds of many important algorithmic problems. Moreover, it seems that it can also help in achieving a better understanding of the hardness of their corresponding problems in terms of time.
△ Less
Submitted 25 July, 2017; v1 submitted 19 June, 2017;
originally announced June 2017.
-
How Hard is it to Find (Honest) Witnesses?
Authors:
Isaac Goldstein,
Tsvi Kopelowitz,
Moshe Lewenstein,
Ely Porat
Abstract:
In recent years much effort was put into developing polynomial-time conditional lower bounds for algorithms and data structures in both static and dynamic settings. Along these lines we suggest a framework for proving conditional lower bounds based on the well-known 3SUM conjecture. Our framework creates a \emph{compact representation} of an instance of the 3SUM problem using hashing and domain sp…
▽ More
In recent years much effort was put into developing polynomial-time conditional lower bounds for algorithms and data structures in both static and dynamic settings. Along these lines we suggest a framework for proving conditional lower bounds based on the well-known 3SUM conjecture. Our framework creates a \emph{compact representation} of an instance of the 3SUM problem using hashing and domain specific encoding. This compact representation admits false solutions to the original 3SUM problem instance which we reveal and eliminate until we find a true solution. In other words, from all \emph{witnesses} (candidate solutions) we figure out if an \emph{honest} one (a true solution) exists. This enumeration of witnesses is used to prove conditional lower bound on \emph{reporting} problems that generate all witnesses. In turn, these reporting problems are reduced to various decision problems. These help to enumerate the witnesses by constructing appropriate search data structures. Hence, 3SUM-hardness of the decision problems is deduced.
We utilize this framework to show conditional lower bounds for several variants of convolutions, matrix multiplication and string problems. Our framework uses a strong connection between all of these problems and the ability to find \emph{witnesses}.
While these specific applications are used to demonstrate the techniques of our framework, we believe that this novel framework is useful for many other problems as well.
△ Less
Submitted 19 June, 2017;
originally announced June 2017.
-
Streaming Pattern Matching with d Wildcards
Authors:
Shay Golan,
Tsvi Kopelowitz,
Ely Porat
Abstract:
In the pattern matching with $d$ wildcards problem one is given a text $T$ of length $n$ and a pattern $P$ of length $m$ that contains $d$ wildcard characters, each denoted by a special symbol $'?'$. A wildcard character matches any other character. The goal is to establish for each $m$-length substring of $T$ whether it matches $P$. In the streaming model variant of the pattern matching with $d$…
▽ More
In the pattern matching with $d$ wildcards problem one is given a text $T$ of length $n$ and a pattern $P$ of length $m$ that contains $d$ wildcard characters, each denoted by a special symbol $'?'$. A wildcard character matches any other character. The goal is to establish for each $m$-length substring of $T$ whether it matches $P$. In the streaming model variant of the pattern matching with $d$ wildcards problem the text $T$ arrives one character at a time and the goal is to report, before the next character arrives, if the last $m$ characters match $P$ while using only $o(m)$ words of space.
In this paper we introduce two new algorithms for the $d$ wildcard pattern matching problem in the streaming model. The first is a randomized Monte Carlo algorithm that is parameterized by a constant $0\leq δ\leq 1$. This algorithm uses $\tilde{O}(d^{1-δ})$ amortized time per character and $\tilde{O}(d^{1+δ})$ words of space. The second algorithm, which is used as a black box in the first algorithm, is a randomized Monte Carlo algorithm which uses $O(d+\log m)$ worst-case time per character and $O(d\log m)$ words of space.
△ Less
Submitted 5 April, 2017;
originally announced April 2017.
-
Exponential Separations in the Energy Complexity of Leader Election
Authors:
Yi-Jun Chang,
Tsvi Kopelowitz,
Seth Pettie,
Ruosong Wang,
Wei Zhan
Abstract:
Energy is often the most constrained resource for battery-powered wireless devices and the lion's share of energy is often spent on transceiver usage (sending/receiving packets), not on computation. In this paper we study the energy complexity of LeaderElection and ApproximateCounting in several models of wireless radio networks. It turns out that energy complexity is very sensitive to whether the…
▽ More
Energy is often the most constrained resource for battery-powered wireless devices and the lion's share of energy is often spent on transceiver usage (sending/receiving packets), not on computation. In this paper we study the energy complexity of LeaderElection and ApproximateCounting in several models of wireless radio networks. It turns out that energy complexity is very sensitive to whether the devices can generate random bits and their ability to detect collisions. We consider four collision-detection models: Strong-CD (in which transmitters and listeners detect collisions), Sender-CD and Receiver-CD (in which only transmitters or only listeners detect collisions), and No-CD (in which no one detects collisions.)
The take-away message of our results is quite surprising. For randomized LeaderElection algorithms, there is an exponential gap between the energy complexity of Sender-CD and Receiver-CD, and for deterministic LeaderElection algorithms there is another exponential gap, but in the reverse direction.
In particular, the randomized energy complexity of LeaderElection is $Θ(\log^* n)$ in Sender-CD but $Θ(\log(\log^* n))$ in Receiver-CD, where $n$ is the (unknown) number of devices. Its deterministic complexity is $Θ(\log N)$ in Receiver-CD but $Θ(\log\log N)$ in Sender-CD, where $N$ is the (known) size of the devices' ID space.
There is a tradeoff between time and energy. We give a new upper bound on the time-energy tradeoff curve for randomized LeaderElection and ApproximateCounting. A critical component of this algorithm is a new deterministic LeaderElection algorithm for dense instances, when $n=Θ(N)$, with inverse-Ackermann-type ($O(α(N))$) energy complexity.
△ Less
Submitted 5 September, 2018; v1 submitted 27 September, 2016;
originally announced September 2016.
-
Fully Dynamic Connectivity in $O(\log n(\log\log n)^2)$ Amortized Expected Time
Authors:
Shang-En Huang,
Dawei Huang,
Tsvi Kopelowitz,
Seth Pettie,
Mikkel Thorup
Abstract:
Dynamic connectivity is one of the most fundamental problems in dynamic graph algorithms. We present a randomized Las Vegas dynamic connectivity data structure with $O(\log n(\log\log n)^2)$ amortized expected update time and $O(\log n/\log\log\log n)$ worst case query time, which comes very close to the cell probe lower bounds of Patrascu and Demaine (2006) and Patrascu and Thorup (2011).
Dynamic connectivity is one of the most fundamental problems in dynamic graph algorithms. We present a randomized Las Vegas dynamic connectivity data structure with $O(\log n(\log\log n)^2)$ amortized expected update time and $O(\log n/\log\log\log n)$ worst case query time, which comes very close to the cell probe lower bounds of Patrascu and Demaine (2006) and Patrascu and Thorup (2011).
△ Less
Submitted 28 April, 2023; v1 submitted 19 September, 2016;
originally announced September 2016.
-
An Exponential Separation Between Randomized and Deterministic Complexity in the LOCAL Model
Authors:
Yi-Jun Chang,
Tsvi Kopelowitz,
Seth Pettie
Abstract:
Over the past 30 years numerous algorithms have been designed for symmetry breaking problems in the LOCAL model, such as maximal matching, MIS, vertex coloring, and edge-coloring. For most problems the best randomized algorithm is at least exponentially faster than the best deterministic algorithm. In this paper we prove that these exponential gaps are necessary and establish connections between t…
▽ More
Over the past 30 years numerous algorithms have been designed for symmetry breaking problems in the LOCAL model, such as maximal matching, MIS, vertex coloring, and edge-coloring. For most problems the best randomized algorithm is at least exponentially faster than the best deterministic algorithm. In this paper we prove that these exponential gaps are necessary and establish connections between the deterministic and randomized complexities in the LOCAL model. Each result has a very compelling take-away message:
1. Fast $Δ$-coloring of trees requires random bits: Building on the recent lower bounds of Brandt et al., we prove that the randomized complexity of $Δ$-coloring a tree with maximum degree $Δ\ge 55$ is $Θ(\log_Δ\log n)$, whereas its deterministic complexity is $Θ(\log_Δn)$ for any $Δ\ge 3$. This also establishes a large separation between the deterministic complexity of $Δ$-coloring and $(Δ+1)$-coloring trees.
2. Randomized lower bounds imply deterministic lower bounds: We prove that any deterministic algorithm for a natural class of problems that runs in $O(1)+o(\log_Δn)$ rounds can be transformed to run in $O(\log^*n-\log^*Δ+1)$ rounds. If the transformed algorithm violates a lower bound (even allowing randomization), then one can conclude that the problem requires $Ω(\log_Δn)$ time deterministically.
3. Deterministic lower bounds imply randomized lower bounds: We prove that the randomized complexity of any natural problem on instances of size $n$ is at least its deterministic complexity on instances of size $\sqrt{\log n}$. This shows that a deterministic $Ω(\log_Δn)$ lower bound for any problem implies a randomized $Ω(\log_Δ\log n)$ lower bound. It also illustrates that the graph shattering technique is absolutely essential to the LOCAL model.
△ Less
Submitted 5 April, 2016; v1 submitted 25 February, 2016;
originally announced February 2016.
-
Breaking the Variance: Approximating the Hamming Distance in $\tilde O(1/ε)$ Time Per Alignment
Authors:
Tsvi Kopelowitz,
Ely Porat
Abstract:
The algorithmic tasks of computing the Hamming distance between a given pattern of length $m$ and each location in a text of length $n$ is one of the most fundamental algorithmic tasks in string algorithms. Unfortunately, there is evidence that for a text $T$ of size $n$ and a pattern $P$ of size $m$, one cannot compute the exact Hamming distance for all locations in $T$ in time which is less than…
▽ More
The algorithmic tasks of computing the Hamming distance between a given pattern of length $m$ and each location in a text of length $n$ is one of the most fundamental algorithmic tasks in string algorithms. Unfortunately, there is evidence that for a text $T$ of size $n$ and a pattern $P$ of size $m$, one cannot compute the exact Hamming distance for all locations in $T$ in time which is less than $\tilde O(n\sqrt m)$. However, Karloff~\cite{karloff} showed that if one is willing to suffer a $1\pmε$ approximation, then it is possible to solve the problem with high probability, in $\tilde O(\frac n {ε^2})$ time.
Due to related lower bounds for computing the Hamming distance of two strings in the one-way communication complexity model, it is strongly believed that obtaining an algorithm for solving the approximation version cannot be done much faster as a function of $\frac 1 ε$. We show here that this belief is false by introducing a new $\tilde O(\frac{n}ε)$ time algorithm that succeeds with high probability.
The main idea behind our algorithm, which is common in sparse recovery problems, is to reduce the variance of a specific randomized experiment by (approximately) separating heavy hitters from non-heavy hitters. However, while known sparse recovery techniques work very well on vectors, they do not seem to apply here, where we are dealing with mismatches between pairs of characters. We introduce two main algorithmic ingredients. The first is a new sparse recovery method that applies for pair inputs (such as in our setting). The second is a new construction of hash/projection functions, which allows to count the number of projections that induce mismatches between two characters exponentially faster than brute force. We expect that these algorithmic techniques will be of independent interest.
△ Less
Submitted 14 December, 2015;
originally announced December 2015.
-
Faster Worst Case Deterministic Dynamic Connectivity
Authors:
Casper Kejlberg-Rasmussen,
Tsvi Kopelowitz,
Seth Pettie,
Mikkel Thorup
Abstract:
We present a deterministic dynamic connectivity data structure for undirected graphs with worst case update time $O\left(\sqrt{\frac{n(\log\log n)^2}{\log n}}\right)$ and constant query time. This improves on the previous best deterministic worst case algorithm of Frederickson (STOC 1983) and Eppstein Galil, Italiano, and Nissenzweig (J. ACM 1997), which had update time $O(\sqrt{n})$. All other al…
▽ More
We present a deterministic dynamic connectivity data structure for undirected graphs with worst case update time $O\left(\sqrt{\frac{n(\log\log n)^2}{\log n}}\right)$ and constant query time. This improves on the previous best deterministic worst case algorithm of Frederickson (STOC 1983) and Eppstein Galil, Italiano, and Nissenzweig (J. ACM 1997), which had update time $O(\sqrt{n})$. All other algorithms for dynamic connectivity are either randomized (Monte Carlo) or have only amortized performance guarantees.
△ Less
Submitted 3 November, 2015; v1 submitted 21 July, 2015;
originally announced July 2015.
-
Mind the Gap
Authors:
Amihood Amir,
Tsvi Kopelowitz,
Avivit Levy,
Seth Pettie,
Ely Porat,
B. Riva Shalom
Abstract:
We examine the complexity of the online Dictionary Matching with One Gap Problem (DMOG) which is the following. Preprocess a dictionary $D$ of $d$ patterns, where each pattern contains a special gap symbol that can match any string, so that given a text that arrives online, a character at a time, we can report all of the patterns from $D$ that are suffixes of the text that has arrived so far, befo…
▽ More
We examine the complexity of the online Dictionary Matching with One Gap Problem (DMOG) which is the following. Preprocess a dictionary $D$ of $d$ patterns, where each pattern contains a special gap symbol that can match any string, so that given a text that arrives online, a character at a time, we can report all of the patterns from $D$ that are suffixes of the text that has arrived so far, before the next character arrives. In more general versions the gap symbols are associated with bounds determining the possible lengths of matching strings. Finding efficient algorithmic solutions for (online) DMOG has proven to be a difficult algorithmic challenge. We demonstrate that the difficulty in obtaining efficient solutions for the DMOG problem even, in the offline setting, can be traced back to the infamous 3SUM conjecture. Interestingly, our reduction deviates from the known reduction paths that follow from 3SUM. In particular, most reductions from 3SUM go through the set-disjointness problem, which corresponds to the problem of preprocessing a graph to answer edge-triangles queries. We use a new path of reductions by considering the complementary, although structurally very different, vertex-triangles queries. Using this new path we show a conditional lower bound of $Ω(δ(G_D)+op)$ time per text character, where $G_D$ is a bipartite graph that captures the structure of $D$, $δ(G_D)$ is the degeneracy of this graph, and $op$ is the output size. We also provide matching upper-bounds (up to sub-polynomial factors) for the vertex-triangles problem, and then extend these techniques to the online DMOG problem. In particular, we introduce algorithms whose time cost depends linearly on $δ(G_D)$. Our algorithms make use of graph orientations, together with some additional techniques. Finally, when $δ(G_D)$ is large we are able to obtain even more efficient solutions.
△ Less
Submitted 9 July, 2015; v1 submitted 25 March, 2015;
originally announced March 2015.
-
The Family Holiday Gathering Problem or Fair and Periodic Scheduling of Independent Sets
Authors:
Amihood Amir,
Oren Kapah,
Tsvi Kopelowitz,
Moni Naor,
Ely Porat
Abstract:
We introduce and examine the {\em Holiday Gathering Problem} which models the difficulty that couples have when trying to decide with which parents should they spend the holiday. Our goal is to schedule the family gatherings so that the parents that will be {\em happy}, i.e.\ all their children will be home {\em simultaneously} for the holiday festivities, while minimizing the number of consecutiv…
▽ More
We introduce and examine the {\em Holiday Gathering Problem} which models the difficulty that couples have when trying to decide with which parents should they spend the holiday. Our goal is to schedule the family gatherings so that the parents that will be {\em happy}, i.e.\ all their children will be home {\em simultaneously} for the holiday festivities, while minimizing the number of consecutive holidays in which parents are not happy.
The holiday gathering problem is closely related to several classical problems in computer science, such as the {\em dining philosophers problem} on a general graph and periodic scheduling,and has applications in scheduling of transmissions made by cellular radios. We also show interesting connections between periodic scheduling, coloring, and universal prefix free encodings.
The combinatorial definition of the Holiday Gathering Problem is: given a graph $G$, find an infinite sequence of independent-sets of $G$. The objective function is to minimize, for every node $v$, the maximal gap between two appearances of $v$. In good solutions this gap depends on local properties of the node (i.e., its degree) and the the solution should be periodic, i.e.\ a node appears every fixed number of periods. We show a coloring-based construction where the period of each node colored with the $c$ is at most $2^{1+\log^*c}\cdot\prod_{i=0}^{\log^*c} \log^{(i)}c$ (where $\log^{(i)}$ means iterating the $\log$ function $i$ times). This is achieved via a connection with {\it prefix-free encodings}. We prove that this is the best possible for coloring-based solutions. We also show a construction with period at most $2d$ for a node of degree $d$.
△ Less
Submitted 10 August, 2014;
originally announced August 2014.
-
Higher Lower Bounds from the 3SUM Conjecture
Authors:
Tsvi Kopelowitz,
Seth Pettie,
Ely Porat
Abstract:
The 3SUM conjecture has proven to be a valuable tool for proving conditional lower bounds on dynamic data structures and graph problems. This line of work was initiated by Pǎtraşcu (STOC 2010) who reduced 3SUM to an offline SetDisjointness problem. However, the reduction introduced by Pǎtraşcu suffers from several inefficiencies, making it difficult to obtain tight conditional lower bounds from th…
▽ More
The 3SUM conjecture has proven to be a valuable tool for proving conditional lower bounds on dynamic data structures and graph problems. This line of work was initiated by Pǎtraşcu (STOC 2010) who reduced 3SUM to an offline SetDisjointness problem. However, the reduction introduced by Pǎtraşcu suffers from several inefficiencies, making it difficult to obtain tight conditional lower bounds from the 3SUM conjecture.
In this paper we address many of the deficiencies of Pǎtraşcu's framework. We give new and efficient reductions from 3SUM to offline SetDisjointness and offline SetIntersection (the reporting version of SetDisjointness) which leads to polynomially higher lower bounds on several problems. Using our reductions, we are able to show the essential optimality of several algorithms, assuming the 3SUM conjecture.
- Chiba and Nishizeki's $O(mα)$-time algorithm (SICOMP 1985) for enumerating all triangles in a graph with arboricity/degeneracy $α$ is essentially optimal, for any $α$.
- Bjørklund, Pagh, Williams, and Zwick's algorithm (ICALP 2014) for listing $t$ triangles is essentially optimal (assuming the matrix multiplication exponent is $ω=2$).
- Any static data structure for SetDisjointness that answers queries in constant time must spend $Ω(N^{2-o(1)})$ time in preprocessing, where $N$ is the size of the set system.
These statements were unattainable via Pǎtraşcu's reductions.
We also introduce several new reductions from 3SUM to pattern matching problems and dynamic graph problems. Of particular interest are new conditional lower bounds for dynamic versions of Maximum Cardinality Matching, which introduce a new technique for obtaining amortized lower bounds.
△ Less
Submitted 12 January, 2019; v1 submitted 24 July, 2014;
originally announced July 2014.
-
Dynamic Set Intersection
Authors:
Tsvi Kopelowitz,
Seth Pettie,
Ely Porat
Abstract:
Consider the problem of maintaining a family $F$ of dynamic sets subject to insertions, deletions, and set-intersection reporting queries: given $S,S'\in F$, report every member of $S\cap S'$ in any order. We show that in the word RAM model, where $w$ is the word size, given a cap $d$ on the maximum size of any set, we can support set intersection queries in $O(\frac{d}{w/\log^2 w})$ expected time…
▽ More
Consider the problem of maintaining a family $F$ of dynamic sets subject to insertions, deletions, and set-intersection reporting queries: given $S,S'\in F$, report every member of $S\cap S'$ in any order. We show that in the word RAM model, where $w$ is the word size, given a cap $d$ on the maximum size of any set, we can support set intersection queries in $O(\frac{d}{w/\log^2 w})$ expected time, and updates in $O(\log w)$ expected time. Using this algorithm we can list all $t$ triangles of a graph $G=(V,E)$ in $O(m+\frac{mα}{w/\log^2 w} +t)$ expected time, where $m=|E|$ and $α$ is the arboricity of $G$. This improves a 30-year old triangle enumeration algorithm of Chiba and Nishizeki running in $O(m α)$ time.
We provide an incremental data structure on $F$ that supports intersection {\em witness} queries, where we only need to find {\em one} $e\in S\cap S'$. Both queries and insertions take $O\paren{\sqrt \frac{N}{w/\log^2 w}}$ expected time, where $N=\sum_{S\in F} |S|$. Finally, we provide time/space tradeoffs for the fully dynamic set intersection reporting problem. Using $M$ words of space, each update costs $O(\sqrt {M \log N})$ expected time, each reporting query costs $O(\frac{N\sqrt{\log N}}{\sqrt M}\sqrt{op+1})$ expected time where $op$ is the size of the output, and each witness query costs $O(\frac{N\sqrt{\log N}}{\sqrt M} + \log N)$ expected time.
△ Less
Submitted 4 May, 2015; v1 submitted 24 July, 2014;
originally announced July 2014.
-
Orienting Fully Dynamic Graphs with Worst-Case Time Bounds
Authors:
Tsvi Kopelowitz,
Robert Krauthgamer,
Ely Porat,
Shay Solomon
Abstract:
In edge orientations, the goal is usually to orient (direct) the edges of an undirected $n$-vertex graph $G$ such that all out-degrees are bounded. When the graph $G$ is fully dynamic, i.e., admits edge insertions and deletions, we wish to maintain such an orientation while keeping a tab on the update time. Low out-degree orientations turned out to be a surprisingly useful tool, with several algor…
▽ More
In edge orientations, the goal is usually to orient (direct) the edges of an undirected $n$-vertex graph $G$ such that all out-degrees are bounded. When the graph $G$ is fully dynamic, i.e., admits edge insertions and deletions, we wish to maintain such an orientation while keeping a tab on the update time. Low out-degree orientations turned out to be a surprisingly useful tool, with several algorithmic applications involving static or dynamic graphs.
Brodal and Fagerberg (1999) initiated the study of the edge orientation problem in terms of the graph's arboricity, which is very natural in this context. They provided a solution with constant out-degree and \emph{amortized} logarithmic update time for all graphs with constant arboricity, which include all planar and excluded-minor graphs. However, it remained an open question (first proposed by Brodal and Fagerberg, later by others) to obtain similar bounds with worst-case update time.
We resolve this 15 year old question in the affirmative, by providing a simple algorithm with worst-case bounds that nearly match the previous amortized bounds. Our algorithm is based on a new approach of a combinatorial invariant, and achieves a logarithmic out-degree with logarithmic worst-case update times. This result has applications in various dynamic graph problems such as maintaining a maximal matching, where we obtain $O(\log n)$ worst-case update time compared to the $O(\frac{\log n}{\log\log n})$ amortized update time of Neiman and Solomon (2013).
△ Less
Submitted 4 December, 2013;
originally announced December 2013.
-
Suffix Trays and Suffix Trists: Structures for Faster Text Indexing
Authors:
Richard Cole,
Tsvi Kopelowitz,
Moshe Lewenstein
Abstract:
Suffix trees and suffix arrays are two of the most widely used data structures for text indexing. Each uses linear space and can be constructed in linear time for polynomially sized alphabets. However, when it comes to answering queries with worst-case deterministic time bounds, the prior does so in $O(m\log|Σ|)$ time, where $m$ is the query size, $|Σ|$ is the alphabet size, and the latter does so…
▽ More
Suffix trees and suffix arrays are two of the most widely used data structures for text indexing. Each uses linear space and can be constructed in linear time for polynomially sized alphabets. However, when it comes to answering queries with worst-case deterministic time bounds, the prior does so in $O(m\log|Σ|)$ time, where $m$ is the query size, $|Σ|$ is the alphabet size, and the latter does so in $O(m+\log n)$ time, where $n$ is the text size. If one wants to output all appearances of the query, an additive cost of $O(occ)$ time is sufficient, where $occ$ is the size of the output.
We propose a novel way of combining the two into, what we call, a {\em suffix tray}. The space and construction time remain linear and the query time improves to $O(m+\log|Σ|)$ for integer alphabets from a linear range, i.e. $Σ\subset \{1,\cdots, cn\}$, for an arbitrary constant $c$. The construction and query are deterministic. Here also an additive $O(occ)$ time is sufficient if one desires to output all appearances of the query.
We also consider the online version of indexing, where the text arrives online, one character at a time, and indexing queries are answered in tandem. In this variant we create a cross between a suffix tree and a suffix list (a dynamic variant of suffix array) to be called a {\em suffix trist}; it supports queries in $O(m+\log|Σ|)$ time. The suffix trist also uses linear space. Furthermore, if there exists an online construction for a linear-space suffix tree such that the cost of adding a character is worst-case deterministic $f(n,|Σ|)$ ($n$ is the size of the current text), then one can further update the suffix trist in $O(f(n,|Σ|)+\log |Σ|)$ time. The best currently known worst-case deterministic bound for $f(n,|Σ|)$ is $O(\log n)$ time.
△ Less
Submitted 7 November, 2013;
originally announced November 2013.
-
Managing Unbounded-Length Keys in Comparison-Driven Data Structures with Applications to On-Line Indexing
Authors:
Amihood Amir,
Gianni Franceschini,
Roberto Grossi,
Tsvi Kopelowitz,
Moshe Lewenstein,
Noa Lewenstein
Abstract:
This paper presents a general technique for optimally transforming any dynamic data structure that operates on atomic and indivisible keys by constant-time comparisons, into a data structure that handles unbounded-length keys whose comparison cost is not a constant. Examples of these keys are strings, multi-dimensional points, multiple-precision numbers, multi-key data (e.g.~records), XML paths, U…
▽ More
This paper presents a general technique for optimally transforming any dynamic data structure that operates on atomic and indivisible keys by constant-time comparisons, into a data structure that handles unbounded-length keys whose comparison cost is not a constant. Examples of these keys are strings, multi-dimensional points, multiple-precision numbers, multi-key data (e.g.~records), XML paths, URL addresses, etc. The technique is more general than what has been done in previous work as no particular exploitation of the underlying structure of is required. The only requirement is that the insertion of a key must identify its predecessor or its successor.
Using the proposed technique, online suffix tree can be constructed in worst case time $O(\log n)$ per input symbol (as opposed to amortized $O(\log n)$ time per symbol, achieved by previously known algorithms). To our knowledge, our algorithm is the first that achieves $O(\log n)$ worst case time per input symbol. Searching for a pattern of length $m$ in the resulting suffix tree takes $O(\min(m\log |Σ|, m + \log n) + tocc)$ time, where $tocc$ is the number of occurrences of the pattern. The paper also describes more applications and show how to obtain alternative methods for dealing with suffix sorting, dynamic lowest common ancestors and order maintenance.
△ Less
Submitted 3 June, 2013;
originally announced June 2013.
-
Faster Clustering via Preprocessing
Authors:
Tsvi Kopelowitz,
Robert Krauthgamer
Abstract:
We examine the efficiency of clustering a set of points, when the encompassing metric space may be preprocessed in advance. In computational problems of this genre, there is a first stage of preprocessing, whose input is a collection of points $M$; the next stage receives as input a query set $Q\subset M$, and should report a clustering of $Q$ according to some objective, such as 1-median, in whic…
▽ More
We examine the efficiency of clustering a set of points, when the encompassing metric space may be preprocessed in advance. In computational problems of this genre, there is a first stage of preprocessing, whose input is a collection of points $M$; the next stage receives as input a query set $Q\subset M$, and should report a clustering of $Q$ according to some objective, such as 1-median, in which case the answer is a point $a\in M$ minimizing $\sum_{q\in Q} d_M(a,q)$.
We design fast algorithms that approximately solve such problems under standard clustering objectives like $p$-center and $p$-median, when the metric $M$ has low doubling dimension. By leveraging the preprocessing stage, our algorithms achieve query time that is near-linear in the query size $n=|Q|$, and is (almost) independent of the total number of points $m=|M|$.
△ Less
Submitted 26 August, 2012;
originally announced August 2012.
-
On-line Indexing for General Alphabets via Predecessor Queries on Subsets of an Ordered List
Authors:
Tsvi Kopelowitz
Abstract:
The problem of Text Indexing is a fundamental algorithmic problem in which one wishes to preprocess a text in order to quickly locate pattern queries within the text. In the ever evolving world of dynamic and on-line data, there is also a need for developing solutions to index texts which arrive on-line, i.e. a character at a time, and still be able to quickly locate said patterns. In this paper,…
▽ More
The problem of Text Indexing is a fundamental algorithmic problem in which one wishes to preprocess a text in order to quickly locate pattern queries within the text. In the ever evolving world of dynamic and on-line data, there is also a need for developing solutions to index texts which arrive on-line, i.e. a character at a time, and still be able to quickly locate said patterns. In this paper, a new solution for on-line indexing is presented by providing an on-line suffix tree construction in $O(\log \log n + \log\log |Σ|)$ worst-case expected time per character, where $n$ is the size of the string, and $Σ$ is the alphabet. This improves upon all previously known on-line suffix tree constructions for general alphabets, at the cost of having the run time in expectation.
The main idea is to reduce the problem of constructing a suffix tree on-line to an interesting variant of the order maintenance problem, which may be of independent interest. In the famous order maintenance problem, one wishes to maintain a dynamic list $L$ of size $n$ under insertions, deletions, and order queries. In an order query, one is given two nodes from $L$ and must determine which node precedes the other in $L$. In the Predecessor search on Dynamic Subsets of an Ordered Dynamic List problem (POLP) it is also necessary to maintain dynamic subsets of $L$ such that given some $u\in L$ it will be possible to quickly locate the predecessor of $u$ in any subset. This paper provides an efficient data structure capable of solving the POLP with worst-case expected bounds that match the currently best known bounds for predecessor search in the RAM model, improving over a solution which may be implicitly obtained from Dietz [Die89].
Furthermore, this paper improves or simplifies bounds for several additional applications, including fully-persistent arrays and the Order-Maintenance Problem.
△ Less
Submitted 18 August, 2012;
originally announced August 2012.
-
Sparse Suffix Tree Construction with Small Space
Authors:
Philip Bille,
Inge Li Gørtz,
Tsvi Kopelowitz,
Benjamin Sach,
Hjalte Wedel Vildhøj
Abstract:
We consider the problem of constructing a sparse suffix tree (or suffix array) for $b$ suffixes of a given text $T$ of size $n$, using only $O(b)$ words of space during construction time. Breaking the naive bound of $Ω(nb)$ time for this problem has occupied many algorithmic researchers since a different structure, the (evenly spaced) sparse suffix tree, was introduced by K{ä}rkk{ä}inen and Ukkone…
▽ More
We consider the problem of constructing a sparse suffix tree (or suffix array) for $b$ suffixes of a given text $T$ of size $n$, using only $O(b)$ words of space during construction time. Breaking the naive bound of $Ω(nb)$ time for this problem has occupied many algorithmic researchers since a different structure, the (evenly spaced) sparse suffix tree, was introduced by K{ä}rkk{ä}inen and Ukkonen in 1996. While in the evenly spaced sparse suffix tree the suffixes considered must be evenly spaced in $T$, here there is no constraint on the locations of the suffixes.
We show that the sparse suffix tree can be constructed in $O(n\log^2b)$ time. To achieve this we develop a technique, which may be of independent interest, that allows to efficiently answer $b$ longest common prefix queries on suffixes of $T$, using only $O(b)$ space. We expect that this technique will prove useful in many other applications in which space usage is a concern. Furthermore, additional tradeoffs between the space usage and the construction time are given.
△ Less
Submitted 4 July, 2012;
originally announced July 2012.
-
Selection in the Presence of Memory Faults, with Applications to In-place Resilient Sorting
Authors:
Tsvi Kopelowitz,
Nimrod Talmon
Abstract:
The selection problem, where one wishes to locate the $k^{th}$ smallest element in an unsorted array of size $n$, is one of the basic problems studied in computer science. The main focus of this work is designing algorithms for solving the selection problem in the presence of memory faults. These can happen as the result of cosmic rays, alpha particles, or hardware failures.
Specifically, the co…
▽ More
The selection problem, where one wishes to locate the $k^{th}$ smallest element in an unsorted array of size $n$, is one of the basic problems studied in computer science. The main focus of this work is designing algorithms for solving the selection problem in the presence of memory faults. These can happen as the result of cosmic rays, alpha particles, or hardware failures.
Specifically, the computational model assumed here is a faulty variant of the RAM model (abbreviated as FRAM), which was introduced by Finocchi and Italiano. In this model, the content of memory cells might get corrupted adversarially during the execution, and the algorithm is given an upper bound $δ$ on the number of corruptions that may occur.
The main contribution of this work is a deterministic resilient selection algorithm with optimal O(n) worst-case running time. Interestingly, the running time does not depend on the number of faults, and the algorithm does not need to know $δ$.
The aforementioned resilient selection algorithm can be used to improve the complexity bounds for resilient $k$-d trees developed by Gieseke, Moruz and Vahrenhold. Specifically, the time complexity for constructing a $k$-d tree is improved from $O(n\log^2 n + δ^2)$ to $O(n \log n)$.
Besides the deterministic algorithm, a randomized resilient selection algorithm is developed, which is simpler than the deterministic one, and has $O(n + α)$ expected time complexity and O(1) space complexity (i.e., is in-place). This algorithm is used to develop the first resilient sorting algorithm that is in-place and achieves optimal $O(n\log n + αδ)$ expected running time.
△ Less
Submitted 28 August, 2012; v1 submitted 23 April, 2012;
originally announced April 2012.
-
Fast, precise and dynamic distance queries
Authors:
Yair Bartal,
Lee-Ad Gottlieb,
Tsvi Kopelowitz,
Moshe Lewenstein,
Liam Roditty
Abstract:
We present an approximate distance oracle for a point set S with n points and doubling dimension λ. For every ε>0, the oracle supports (1+ε)-approximate distance queries in (universal) constant time, occupies space [ε^{-O(λ)} + 2^{O(λ log λ)}]n, and can be constructed in [2^{O(λ)} log3 n + ε^{-O(λ)} + 2^{O(λ log λ)}]n expected time. This improves upon the best previously known constructions, prese…
▽ More
We present an approximate distance oracle for a point set S with n points and doubling dimension λ. For every ε>0, the oracle supports (1+ε)-approximate distance queries in (universal) constant time, occupies space [ε^{-O(λ)} + 2^{O(λ log λ)}]n, and can be constructed in [2^{O(λ)} log3 n + ε^{-O(λ)} + 2^{O(λ log λ)}]n expected time. This improves upon the best previously known constructions, presented by Har-Peled and Mendel. Furthermore, the oracle can be made fully dynamic with expected O(1) query time and only 2^{O(λ)} log n + ε^{-O(λ)} + 2^{O(λ log λ)} update time. This is the first fully dynamic (1+ε)-distance oracle.
△ Less
Submitted 9 August, 2010;
originally announced August 2010.