-
Periodicity in Data Streams with Wildcards
Authors:
Funda Ergün,
Elena Grigorescu,
Erfan Sadeqi Azer,
Samson Zhou
Abstract:
We investigate the problem of detecting periodic trends within a string $S$ of length $n$, arriving in the streaming model, containing at most $k$ wildcard characters, where $k=o(n)$. A wildcard character is a special character that can be assigned any other character. We say $S$ has wildcard-period $p$ if there exists an assignment to each of the wildcard characters so that in the resulting strea…
▽ More
We investigate the problem of detecting periodic trends within a string $S$ of length $n$, arriving in the streaming model, containing at most $k$ wildcard characters, where $k=o(n)$. A wildcard character is a special character that can be assigned any other character. We say $S$ has wildcard-period $p$ if there exists an assignment to each of the wildcard characters so that in the resulting stream the length $n-p$ prefix equals the length $n-p$ suffix. We present a two-pass streaming algorithm that computes wildcard-periods of $S$ using $\mathcal{O}(k^3\,\mathsf{polylog}\,n)$ bits of space, while we also show that this problem cannot be solved in sublinear space in one pass. We then give a one-pass randomized streaming algorithm that computes all wildcard-periods $p$ of $S$ with $p<\frac{n}{2}$ and no wildcard characters appearing in the last $p$ symbols of $S$, using $\mathcal{O}(k^3\mathsf{polylog}\, n)$ space.
△ Less
Submitted 3 March, 2018; v1 submitted 20 February, 2018;
originally announced February 2018.
-
Streaming Periodicity with Mismatches
Authors:
Funda Ergün,
Elena Grigorescu,
Erfan Sadeqi Azer,
Samson Zhou
Abstract:
We study the problem of finding all $k$-periods of a length-$n$ string $S$, presented as a data stream. $S$ is said to have $k$-period $p$ if its prefix of length $n-p$ differs from its suffix of length $n-p$ in at most $k$ locations.
We give a one-pass streaming algorithm that computes the $k$-periods of a string $S$ using $\text{poly}(k, \log n)$ bits of space, for $k$-periods of length at mos…
▽ More
We study the problem of finding all $k$-periods of a length-$n$ string $S$, presented as a data stream. $S$ is said to have $k$-period $p$ if its prefix of length $n-p$ differs from its suffix of length $n-p$ in at most $k$ locations.
We give a one-pass streaming algorithm that computes the $k$-periods of a string $S$ using $\text{poly}(k, \log n)$ bits of space, for $k$-periods of length at most $\frac{n}{2}$. We also present a two-pass streaming algorithm that computes $k$-periods of $S$ using $\text{poly}(k, \log n)$ bits of space, regardless of period length. We complement these results with comparable lower bounds.
△ Less
Submitted 14 August, 2017;
originally announced August 2017.
-
Palindrome Recognition In The Streaming Model
Authors:
Petra Berenbrink,
Funda Ergün,
Frederik Mallmann-Trenn,
Erfan Sadeqi Azer
Abstract:
In the Palindrome Problem one tries to find all palindromes (palindromic substrings) in a given string. A palindrome is defined as a string which reads forwards the same as backwards, e.g., the string "racecar". A related problem is the Longest Palindromic Substring Problem in which finding an arbitrary one of the longest palindromes in the given string suffices. We regard the streaming version of…
▽ More
In the Palindrome Problem one tries to find all palindromes (palindromic substrings) in a given string. A palindrome is defined as a string which reads forwards the same as backwards, e.g., the string "racecar". A related problem is the Longest Palindromic Substring Problem in which finding an arbitrary one of the longest palindromes in the given string suffices. We regard the streaming version of both problems. In the streaming model the input arrives over time and at every point in time we are only allowed to use sublinear space. The main algorithms in this paper are the following: The first one is a one-pass randomized algorithm that solves the Palindrome Problem. It has an additive error and uses $O(\sqrt n$) space. The second algorithm is a two-pass algorithm which determines the exact locations of all longest palindromes. It uses the first algorithm as the first pass. The third algorithm is again a one-pass randomized algorithm, which solves the Longest Palindromic Substring Problem. It has a multiplicative error using only $O(\log(n))$ space. We also give two variants of the first algorithm which solve other related practical problems.
△ Less
Submitted 28 January, 2016; v1 submitted 15 August, 2013;
originally announced August 2013.