-
Efficient Gauss Elimination for Near-Quadratic Matrices with One Short Random Block per Row, with Applications
Authors:
Martin Dietzfelbinger,
Stefan Walzer
Abstract:
In this paper we identify a new class of sparse near-quadratic random Boolean matrices that have full row rank over $\mathbb{F}_2=\{0,1\}$ with high probability and can be transformed into echelon form in almost linear time by a simple version of Gauss elimination. The random matrix with dimensions $n(1-\varepsilon) \times n$ is generated as follows: In each row, identify a block of length…
▽ More
In this paper we identify a new class of sparse near-quadratic random Boolean matrices that have full row rank over $\mathbb{F}_2=\{0,1\}$ with high probability and can be transformed into echelon form in almost linear time by a simple version of Gauss elimination. The random matrix with dimensions $n(1-\varepsilon) \times n$ is generated as follows: In each row, identify a block of length $L=O((\log n)/\varepsilon)$ at a random position. The entries outside the block are 0, the entries inside the block are given by fair coin tosses. Sorting the rows according to the positions of the blocks transforms the matrix into a kind of band matrix, on which, as it turns out, Gauss elimination works very efficiently with high probability. For the proof, the effects of Gauss elimination are interpreted as a ("coin-flipping") variant of Robin Hood hashing, whose behaviour can be captured in terms of a simple Markov model from queuing theory. Bounds for expected construction time and high success probability follow from results in this area.
By employing hashing, this matrix family leads to a new implementation of a retrieval data structure, which represents an arbitrary function $f\colon S \to \{0,1\}$ for some set $S$ of $m=(1-\varepsilon)n$ keys. It requires $m/(1-\varepsilon)$ bits of space, construction takes $O(m/\varepsilon^2$) expected time on a word RAM, while queries take $O(1/\varepsilon)$ time and access only one contiguous segment of $O((\log m)/\varepsilon)$ bits in the representation. The method is competitive with state-of-the-art methods. By well-established methods the retrieval data structure leads to efficient constructions of (static) perfect hash functions and (static) Bloom filters with almost optimal space and very local storage access patterns for queries.
△ Less
Submitted 10 July, 2019;
originally announced July 2019.
-
Dense Peelable Random Uniform Hypergraphs
Authors:
Martin Dietzfelbinger,
Stefan Walzer
Abstract:
We describe a new family of $k$-uniform hypergraphs with independent random edges. The hypergraphs have a high probability of being peelable, i.e. to admit no sub-hypergraph of minimum degree $2$, even when the edge density (number of edges over vertices) is close to $1$. In our construction, the vertex set is partitioned into linearly arranged segments and each edge is incident to random vertices…
▽ More
We describe a new family of $k$-uniform hypergraphs with independent random edges. The hypergraphs have a high probability of being peelable, i.e. to admit no sub-hypergraph of minimum degree $2$, even when the edge density (number of edges over vertices) is close to $1$. In our construction, the vertex set is partitioned into linearly arranged segments and each edge is incident to random vertices of $k$ consecutive segments. Quite surprisingly, the linear geometry allows our graphs to be peeled "from the outside in". The density thresholds $f_k$ for peelability of our hypergraphs ($f_3 \approx 0.918$, $f_4 \approx 0.977$, $f_5 \approx 0.992$, ...) are well beyond the corresponding thresholds ($c_3 \approx 0.818$, $c_4 \approx 0.772$, $c_5 \approx 0.702$, ...) of standard $k$-uniform random hypergraphs. To get a grip on $f_k$, we analyse an idealised peeling process on the random weak limit of our hypergraph family. The process can be described in terms of an operator on functions and $f_k$ can be linked to thresholds relating to the operator. These thresholds are then tractable with numerical methods.
Random hypergraphs underlie the construction of various data structures based on hashing. These data structures frequently rely on peelability of the hypergraph or peelability allows for simple linear time algorithms. To demonstrate the usefulness of our construction, we used our $3$-uniform hypergraphs as a drop-in replacement for the standard $3$-uniform hypergraphs in a retrieval data structure by Botelho et al. This reduces memory usage from $1.23m$ bits to $1.12m$ bits ($m$ being the input size) with almost no change in running time.
△ Less
Submitted 10 July, 2019;
originally announced July 2019.
-
A Subquadratic Algorithm for 3XOR
Authors:
Martin Dietzfelbinger,
Philipp Schlag,
Stefan Walzer
Abstract:
Given a set $X$ of $n$ binary words of equal length $w$, the 3XOR problem asks for three elements $a, b, c \in X$ such that $a \oplus b=c$, where $ \oplus$ denotes the bitwise XOR operation. The problem can be easily solved on a word RAM with word length $w$ in time $O(n^2 \log{n})$. Using Han's fast integer sorting algorithm (2002/2004) this can be reduced to $O(n^2 \log{\log{n}})$. With randomiz…
▽ More
Given a set $X$ of $n$ binary words of equal length $w$, the 3XOR problem asks for three elements $a, b, c \in X$ such that $a \oplus b=c$, where $ \oplus$ denotes the bitwise XOR operation. The problem can be easily solved on a word RAM with word length $w$ in time $O(n^2 \log{n})$. Using Han's fast integer sorting algorithm (2002/2004) this can be reduced to $O(n^2 \log{\log{n}})$. With randomization or a sophisticated deterministic dictionary construction, creating a hash table for $X$ with constant lookup time leads to an algorithm with (expected) running time $O(n^2)$. At present, seemingly no faster algorithms are known. We present a surprisingly simple deterministic, quadratic time algorithm for 3XOR. Its core is a version of the Patricia trie for $X$, which makes it possible to traverse the set $a \oplus X$ in ascending order for arbitrary $a\in \{0, 1\}^{w}$ in linear time.
Furthermore, we describe a randomized algorithm for 3XOR with expected running time $O(n^2\cdot\min\{\log^3{w}/w, (\log\log{n})^2/\log^2 n\})$. The algorithm transfers techniques to our setting that were used by Baran, Demaine, and Pătraşcu (2005/2008) for solving the related int3SUM problem (the same problem with integer addition in place of binary XOR) in expected time $o(n^2)$. As suggested by Jafargholi and Viola (2016), linear hash functions are employed. The latter authors also showed that assuming 3XOR needs expected running time $n^{2-o(1)}$ one can prove conditional lower bounds for triangle enumeration just as with 3SUM. We demonstrate that 3XOR can be reduced to other problems as well, treating the examples offline SetDisjointness and offline SetIntersection, which were studied for 3SUM by Kopelowitz, Pettie, and Porat (2016).
△ Less
Submitted 30 April, 2018;
originally announced April 2018.
-
Dual-Pivot Quicksort: Optimality, Analysis and Zeros of Associated Lattice Paths
Authors:
Martin Aumüller,
Martin Dietzfelbinger,
Clemens Heuberger,
Daniel Krenn,
Helmut Prodinger
Abstract:
We present an average case analysis of a variant of dual-pivot quicksort. We show that the used algorithmic partitioning strategy is optimal, i.e., it minimizes the expected number of key comparisons. For the analysis, we calculate the expected number of comparisons exactly as well as asymptotically, in particular, we provide exact expressions for the linear, logarithmic, and constant terms.
An…
▽ More
We present an average case analysis of a variant of dual-pivot quicksort. We show that the used algorithmic partitioning strategy is optimal, i.e., it minimizes the expected number of key comparisons. For the analysis, we calculate the expected number of comparisons exactly as well as asymptotically, in particular, we provide exact expressions for the linear, logarithmic, and constant terms.
An essential step is the analysis of zeros of lattice paths in a certain probability model. Along the way a combinatorial identity is proven.
△ Less
Submitted 27 November, 2017; v1 submitted 1 November, 2016;
originally announced November 2016.
-
A Simple Hash Class with Strong Randomness Properties in Graphs and Hypergraphs
Authors:
Martin Aumüller,
Martin Dietzfelbinger,
Philipp Woelfel
Abstract:
We study randomness properties of graphs and hypergraphs generated by simple hash functions. Several hashing applications can be analyzed by studying the structure of $d$-uniform random ($d$-partite) hypergraphs obtained from a set $S$ of $n$ keys and $d$ randomly chosen hash functions $h_1,\dots,h_d$ by associating each key $x\in S$ with a hyperedge $\{h_1(x),\dots, h_d(x)\}$. Often it is assumed…
▽ More
We study randomness properties of graphs and hypergraphs generated by simple hash functions. Several hashing applications can be analyzed by studying the structure of $d$-uniform random ($d$-partite) hypergraphs obtained from a set $S$ of $n$ keys and $d$ randomly chosen hash functions $h_1,\dots,h_d$ by associating each key $x\in S$ with a hyperedge $\{h_1(x),\dots, h_d(x)\}$. Often it is assumed that $h_1,\dots,h_d$ exhibit a high degree of independence. We present a simple construction of a hash class whose hash functions have small constant evaluation time and can be stored in sublinear space. We devise general techniques to analyze the randomness properties of the graphs and hypergraphs generated by these hash functions, and we show that they can replace other, less efficient constructions in cuckoo hashing (with and without stash), the simulation of a uniform hash function, the construction of a perfect hash function, generalized cuckoo hashing and different load balancing scenarios.
△ Less
Submitted 31 October, 2016;
originally announced November 2016.
-
Counting Zeros in Random Walks on the Integers and Analysis of Optimal Dual-Pivot Quicksort
Authors:
Martin Aumüller,
Martin Dietzfelbinger,
Clemens Heuberger,
Daniel Krenn,
Helmut Prodinger
Abstract:
We present an average case analysis of two variants of dual-pivot quicksort, one with a non-algorithmic comparison-optimal partitioning strategy, the other with a closely related algorithmic strategy. For both we calculate the expected number of comparisons exactly as well as asymptotically, in particular, we provide exact expressions for the linear, logarithmic, and constant terms. An essential s…
▽ More
We present an average case analysis of two variants of dual-pivot quicksort, one with a non-algorithmic comparison-optimal partitioning strategy, the other with a closely related algorithmic strategy. For both we calculate the expected number of comparisons exactly as well as asymptotically, in particular, we provide exact expressions for the linear, logarithmic, and constant terms. An essential step is the analysis of zeros of lattice paths in a certain probability model. Along the way a combinatorial identity is proven.
△ Less
Submitted 11 May, 2016; v1 submitted 12 February, 2016;
originally announced February 2016.
-
How Good is Multi-Pivot Quicksort?
Authors:
Martin Aumüller,
Martin Dietzfelbinger,
Pascal Klaue
Abstract:
Multi-Pivot Quicksort refers to variants of classical quicksort where in the partitioning step $k$ pivots are used to split the input into $k + 1$ segments. For many years, multi-pivot quicksort was regarded as impractical, but in 2009 a 2-pivot approach by Yaroslavskiy, Bentley, and Bloch was chosen as the standard sorting algorithm in Sun's Java 7. In 2014 at ALENEX, Kushagra et al. introduced a…
▽ More
Multi-Pivot Quicksort refers to variants of classical quicksort where in the partitioning step $k$ pivots are used to split the input into $k + 1$ segments. For many years, multi-pivot quicksort was regarded as impractical, but in 2009 a 2-pivot approach by Yaroslavskiy, Bentley, and Bloch was chosen as the standard sorting algorithm in Sun's Java 7. In 2014 at ALENEX, Kushagra et al. introduced an even faster algorithm that uses three pivots. This paper studies what possible advantages multi-pivot quicksort might offer in general. The contributions are as follows: Natural comparison-optimal algorithms for multi-pivot quicksort are devised and analyzed. The analysis shows that the benefits of using multiple pivots with respect to the average comparison count are marginal and these strategies are inferior to simpler strategies such as the well known median-of-$k$ approach. A substantial part of the partitioning cost is caused by rearranging elements. A rigorous analysis of an algorithm for rearranging elements in the partitioning step is carried out, observing mainly how often array cells are accessed during partitioning. The algorithm behaves best if 3 to 5 pivots are used. Experiments show that this translates into good cache behavior and is closest to predicting observed running times of multi-pivot quicksort algorithms. Finally, it is studied how choosing pivots from a sample affects sorting cost. The study is theoretical in the sense that although the findings motivate design recommendations for multipivot quicksort algorithms that lead to running time improvements over known algorithms in an experimental setting, these improvements are small.
△ Less
Submitted 31 May, 2016; v1 submitted 15 October, 2015;
originally announced October 2015.
-
On testing single connectedness in directed graphs and some related problems
Authors:
Martin Dietzfelbinger,
Raed Jaberi
Abstract:
Let $G=(V,E)$ be a directed graph with $n$ vertices and $m$ edges. The graph $G$ is called singly-connected if for each pair of vertices $v,w \in V$ there is at most one simple path from $v$ to $w$ in $G$. Buchsbaum and Carlisle (1993) gave an algorithm for testing whether $G$ is singly-connected in $O(n^{2})$ time. In this paper we describe a refined version of this algorithm with running time…
▽ More
Let $G=(V,E)$ be a directed graph with $n$ vertices and $m$ edges. The graph $G$ is called singly-connected if for each pair of vertices $v,w \in V$ there is at most one simple path from $v$ to $w$ in $G$. Buchsbaum and Carlisle (1993) gave an algorithm for testing whether $G$ is singly-connected in $O(n^{2})$ time. In this paper we describe a refined version of this algorithm with running time $O(s\cdot t+m)$, where $s$ and $t$ are the number of sources and sinks, respectively, in the reduced graph $G^{r}$ obtained by first contracting each strongly connected component of $G$ into one vertex and then eliminating vertices of indegree or outdegree $1$ by a contraction operation. Moreover, we show that the problem of finding a minimum cardinality edge subset $C\subseteq E$ (respectively, vertex subset $F\subseteq V$) whose removal from $G$ leaves a singly-connected graph is NP-hard.
△ Less
Submitted 2 March, 2015; v1 submitted 4 December, 2014;
originally announced December 2014.
-
Tight Lower Bounds for Greedy Routing in Higher-Dimensional Small-World Grids
Authors:
Martin Dietzfelbinger,
Philipp Woelfel
Abstract:
We consider Kleinberg's celebrated small world graph model (Kleinberg, 2000), in which a D-dimensional grid {0,...,n-1}^D is augmented with a constant number of additional unidirectional edges leaving each node. These long range edges are determined at random according to a probability distribution (the augmenting distribution), which is the same for each node. Kleinberg suggested using the invers…
▽ More
We consider Kleinberg's celebrated small world graph model (Kleinberg, 2000), in which a D-dimensional grid {0,...,n-1}^D is augmented with a constant number of additional unidirectional edges leaving each node. These long range edges are determined at random according to a probability distribution (the augmenting distribution), which is the same for each node. Kleinberg suggested using the inverse D-th power distribution, in which node v is the long range contact of node u with a probability proportional to ||u-v||^(-D). He showed that such an augmenting distribution allows to route a message efficiently in the resulting random graph: The greedy algorithm, where in each intermediate node the message travels over a link that brings the message closest to the target w.r.t. the Manhattan distance, finds a path of expected length O(log^2 n) between any two nodes. In this paper we prove that greedy routing does not perform asymptotically better for any uniform and isotropic augmenting distribution, i.e., the probability that node u has a particular long range contact v is independent of the labels of u and v and only a function of ||u-v||.
In order to obtain the result, we introduce a novel proof technique: We define a budget game, in which a token travels over a game board, while the player manages a "probability budget". In each round, the player bets part of her remaining probability budget on step sizes. A step size is chosen at random according to a probability distribution of the player's bet. The token then makes progress as determined by the chosen step size, while some of the player's bet is removed from her probability budget. We prove a tight lower bound for such a budget game, and then obtain a lower bound for greedy routing in the D-dimensional grid by a reduction.
△ Less
Submitted 6 May, 2013;
originally announced May 2013.
-
Optimal Partitioning for Dual-Pivot Quicksort
Authors:
Martin Aumüller,
Martin Dietzfelbinger
Abstract:
Dual-pivot quicksort refers to variants of classical quicksort where in the partitioning step two pivots are used to split the input into three segments. This can be done in different ways, giving rise to different algorithms. Recently, a dual-pivot algorithm proposed by Yaroslavskiy received much attention, because a variant of it replaced the well-engineered quicksort algorithm in Sun's Java 7 r…
▽ More
Dual-pivot quicksort refers to variants of classical quicksort where in the partitioning step two pivots are used to split the input into three segments. This can be done in different ways, giving rise to different algorithms. Recently, a dual-pivot algorithm proposed by Yaroslavskiy received much attention, because a variant of it replaced the well-engineered quicksort algorithm in Sun's Java 7 runtime library. Nebel and Wild (ESA 2012) analyzed this algorithm and showed that on average it uses 1.9n ln n + O(n) comparisons to sort an input of size n, beating standard quicksort, which uses 2n ln n + O(n) comparisons. We introduce a model that captures all dual-pivot algorithms, give a unified analysis, and identify new dual-pivot algorithms that minimize the average number of key comparisons among all possible algorithms up to a linear term. This minimum is 1.8n ln n + O(n). For the case that the pivots are chosen from a small sample, we include a comparison of dual-pivot quicksort and classical quicksort. Specifically, we show that dual-pivot quicksort benefits from a skewed choice of pivots. We experimentally evaluate our algorithms and compare them to Yaroslavskiy's algorithm and the recently described three-pivot quicksort algorithm of Kushagra et al. (ALENEX 2014).
△ Less
Submitted 13 October, 2015; v1 submitted 21 March, 2013;
originally announced March 2013.
-
Explicit and Efficient Hash Families Suffice for Cuckoo Hashing with a Stash
Authors:
Martin Aumüller,
Martin Dietzfelbinger,
Philipp Woelfel
Abstract:
It is shown that for cuckoo hashing with a stash as proposed by Kirsch, Mitzenmacher, and Wieder (2008) families of very simple hash functions can be used, maintaining the favorable performance guarantees: with stash size $s$ the probability of a rehash is $O(1/n^{s+1})$, and the evaluation time is $O(s)$. Instead of the full randomness needed for the analysis of Kirsch et al. and of Kutzelnigg (2…
▽ More
It is shown that for cuckoo hashing with a stash as proposed by Kirsch, Mitzenmacher, and Wieder (2008) families of very simple hash functions can be used, maintaining the favorable performance guarantees: with stash size $s$ the probability of a rehash is $O(1/n^{s+1})$, and the evaluation time is $O(s)$. Instead of the full randomness needed for the analysis of Kirsch et al. and of Kutzelnigg (2010) (resp. $Θ(\log n)$-wise independence for standard cuckoo hashing) the new approach even works with 2-wise independent hash families as building blocks. Both construction and analysis build upon the work of Dietzfelbinger and Woelfel (2003). The analysis, which can also be applied to the fully random case, utilizes a graph counting argument and is much simpler than previous proofs. As a byproduct, an algorithm for simulating uniform hashing is obtained. While it requires about twice as much space as the most space efficient solutions, it is attractive because of its simple and direct structure.
△ Less
Submitted 19 April, 2012;
originally announced April 2012.
-
A More Reliable Greedy Heuristic for Maximum Matchings in Sparse Random Graphs
Authors:
Martin Dietzfelbinger,
Hendrik Peilke,
Michael Rink
Abstract:
We propose a new greedy algorithm for the maximum cardinality matching problem. We give experimental evidence that this algorithm is likely to find a maximum matching in random graphs with constant expected degree c>0, independent of the value of c. This is contrary to the behavior of commonly used greedy matching heuristics which are known to have some range of c where they probably fail to compu…
▽ More
We propose a new greedy algorithm for the maximum cardinality matching problem. We give experimental evidence that this algorithm is likely to find a maximum matching in random graphs with constant expected degree c>0, independent of the value of c. This is contrary to the behavior of commonly used greedy matching heuristics which are known to have some range of c where they probably fail to compute a maximum matching.
△ Less
Submitted 19 March, 2012;
originally announced March 2012.
-
Towards Optimal Degree-distributions for Left-perfect Matchings in Random Bipartite Graphs
Authors:
Martin Dietzfelbinger,
Michael Rink
Abstract:
Consider a random bipartite multigraph $G$ with $n$ left nodes and $m \geq n \geq 2$ right nodes. Each left node $x$ has $d_x \geq 1$ random right neighbors. The average left degree $Δ$ is fixed, $Δ\geq 2$. We ask whether for the probability that $G$ has a left-perfect matching it is advantageous not to fix $d_x$ for each left node $x$ but rather choose it at random according to some (cleverly cho…
▽ More
Consider a random bipartite multigraph $G$ with $n$ left nodes and $m \geq n \geq 2$ right nodes. Each left node $x$ has $d_x \geq 1$ random right neighbors. The average left degree $Δ$ is fixed, $Δ\geq 2$. We ask whether for the probability that $G$ has a left-perfect matching it is advantageous not to fix $d_x$ for each left node $x$ but rather choose it at random according to some (cleverly chosen) distribution. We show the following, provided that the degrees of the left nodes are independent: If $Δ$ is an integer then it is optimal to use a fixed degree of $Δ$ for all left nodes. If $Δ$ is non-integral then an optimal degree-distribution has the property that each left node $x$ has two possible degrees, $\floorΔ$ and $\ceilΔ$, with probability $p_x$ and $1-p_x$, respectively, where $p_x$ is from the closed interval $[0,1]$ and the average over all $p_x$ equals $\ceilΔ-Δ$. Furthermore, if $n=c\cdot m$ and $Δ>2$ is constant, then each distribution of the left degrees that meets the conditions above determines the same threshold $c^*(Δ)$ that has the following property as $n$ goes to infinity: If $c<c^*(Δ)$ then there exists a left-perfect matching with high probability. If $c>c^*(Δ)$ then there exists no left-perfect matching with high probability. The threshold $c^*(Δ)$ is the same as the known threshold for offline $k$-ary cuckoo hashing for integral or non-integral $k=Δ$.
△ Less
Submitted 27 April, 2012; v1 submitted 7 March, 2012;
originally announced March 2012.
-
Cuckoo Hashing with Pages
Authors:
Martin Dietzfelbinger,
Michael Mitzenmacher,
Michael Rink
Abstract:
Although cuckoo hashing has significant applications in both theoretical and practical settings, a relevant downside is that it requires lookups to multiple locations. In many settings, where lookups are expensive, cuckoo hashing becomes a less compelling alternative. One such standard setting is when memory is arranged in large pages, and a major cost is the number of page accesses. We propose th…
▽ More
Although cuckoo hashing has significant applications in both theoretical and practical settings, a relevant downside is that it requires lookups to multiple locations. In many settings, where lookups are expensive, cuckoo hashing becomes a less compelling alternative. One such standard setting is when memory is arranged in large pages, and a major cost is the number of page accesses. We propose the study of cuckoo hashing with pages, advocating approaches where each key has several possible locations, or cells, on a single page, and additional choices on a second backup page. We show experimentally that with k cell choices on one page and a single backup cell choice, one can achieve nearly the same loads as when each key has k+1 random cells to choose from, with most lookups requiring just one page access, even when keys are placed online using a simple algorithm. While our results are currently experimental, they suggest several interesting new open theoretical questions for cuckoo hashing with pages.
△ Less
Submitted 27 April, 2011;
originally announced April 2011.
-
Tight Thresholds for Cuckoo Hashing via XORSAT
Authors:
Martin Dietzfelbinger,
Andreas Goerdt,
Michael Mitzenmacher,
Andrea Montanari,
Rasmus Pagh,
Michael Rink
Abstract:
We settle the question of tight thresholds for offline cuckoo hashing. The problem can be stated as follows: we have n keys to be hashed into m buckets each capable of holding a single key. Each key has k >= 3 (distinct) associated buckets chosen uniformly at random and independently of the choices of other keys. A hash table can be constructed successfully if each key can be placed into one of it…
▽ More
We settle the question of tight thresholds for offline cuckoo hashing. The problem can be stated as follows: we have n keys to be hashed into m buckets each capable of holding a single key. Each key has k >= 3 (distinct) associated buckets chosen uniformly at random and independently of the choices of other keys. A hash table can be constructed successfully if each key can be placed into one of its buckets. We seek thresholds alpha_k such that, as n goes to infinity, if n/m <= alpha for some alpha < alpha_k then a hash table can be constructed successfully with high probability, and if n/m >= alpha for some alpha > alpha_k a hash table cannot be constructed successfully with high probability. Here we are considering the offline version of the problem, where all keys and hash values are given, so the problem is equivalent to previous models of multiple-choice hashing. We find the thresholds for all values of k > 2 by showing that they are in fact the same as the previously known thresholds for the random k-XORSAT problem. We then extend these results to the setting where keys can have differing number of choices, and provide evidence in the form of an algorithm for a conjecture extending this result to cuckoo hash tables that store multiple keys in a bucket.
△ Less
Submitted 21 December, 2010; v1 submitted 1 December, 2009;
originally announced December 2009.
-
Succinct Data Structures for Retrieval and Approximate Membership
Authors:
Martin Dietzfelbinger,
Rasmus Pagh
Abstract:
The retrieval problem is the problem of associating data with keys in a set. Formally, the data structure must store a function f: U ->{0,1}^r that has specified values on the elements of a given set S, a subset of U, |S|=n, but may have any value on elements outside S. Minimal perfect hashing makes it possible to avoid storing the set S, but this induces a space overhead of Theta(n) bits in add…
▽ More
The retrieval problem is the problem of associating data with keys in a set. Formally, the data structure must store a function f: U ->{0,1}^r that has specified values on the elements of a given set S, a subset of U, |S|=n, but may have any value on elements outside S. Minimal perfect hashing makes it possible to avoid storing the set S, but this induces a space overhead of Theta(n) bits in addition to the nr bits needed for function values. In this paper we show how to eliminate this overhead. Moreover, we show that for any k query time O(k) can be achieved using space that is within a factor 1+e^{-k} of optimal, asymptotically for large n. If we allow logarithmic evaluation time, the additive overhead can be reduced to O(log log n) bits whp. The time to construct the data structure is O(n), expected. A main technical ingredient is to utilize existing tight bounds on the probability of almost square random matrices with rows of low weight to have full row rank. In addition to direct constructions, we point out a close connection between retrieval structures and hash tables where keys are stored in an array and some kind of probing scheme is used. Further, we propose a general reduction that transfers the results on retrieval into analogous results on approximate membership, a problem traditionally addressed using Bloom filters. Again, we show how to eliminate the space overhead present in previously known methods, and get arbitrarily close to the lower bound. The evaluation procedures of our data structures are extremely simple (similar to a Bloom filter). For the results stated above we assume free access to fully random hash functions. However, we show how to justify this assumption using extra space o(n) to simulate full randomness on a RAM.
△ Less
Submitted 26 March, 2008;
originally announced March 2008.
-
Tight Bounds for Blind Search on the Integers
Authors:
Martin Dietzfelbinger,
Jonathan E. Rowe,
Ingo Wegener,
Philipp Woelfel
Abstract:
We analyze a simple random process in which a token is moved in the interval $A=\{0,...,n\$: Fix a probability distribution $μ$ over $\{1,...,n\$. Initially, the token is placed in a random position in $A$. In round $t$, a random value $d$ is chosen according to $μ$. If the token is in position $a\geq d$, then it is moved to position $a-d$. Otherwise it stays put. Let $T$ be the number of rounds…
▽ More
We analyze a simple random process in which a token is moved in the interval $A=\{0,...,n\$: Fix a probability distribution $μ$ over $\{1,...,n\$. Initially, the token is placed in a random position in $A$. In round $t$, a random value $d$ is chosen according to $μ$. If the token is in position $a\geq d$, then it is moved to position $a-d$. Otherwise it stays put. Let $T$ be the number of rounds until the token reaches position 0. We show tight bounds for the expectation of $T$ for the optimal distribution $μ$. More precisely, we show that $\min_μ\{E_μ(T)\=Θ((\log n)^2)$. For the proof, a novel potential function argument is introduced. The research is motivated by the problem of approximating the minimum of a continuous function over $[0,1]$ with a ``blind'' optimization strategy.
△ Less
Submitted 20 February, 2008;
originally announced February 2008.