-
Optimal Uncoordinated Unique IDs
Authors:
Peter C. Dillinger,
Martín Farach-Colton,
Guido Tagliavini,
Stefan Walzer
Abstract:
In the Uncoordinated Unique Identifiers Problem (UUIDP) there are $n$ independent instances of an algorithm $\mathcal{A}$ that generates IDs from a universe $\{1, \dots, m\}$, and there is an adversary that requests IDs from these instances. The goal is to design $\mathcal{A}$ such that it minimizes the probability that the same ID is ever generated twice across all instances, that is, minimizes t…
▽ More
In the Uncoordinated Unique Identifiers Problem (UUIDP) there are $n$ independent instances of an algorithm $\mathcal{A}$ that generates IDs from a universe $\{1, \dots, m\}$, and there is an adversary that requests IDs from these instances. The goal is to design $\mathcal{A}$ such that it minimizes the probability that the same ID is ever generated twice across all instances, that is, minimizes the collision probability. Crucially, no communication between the instances of $\mathcal{A}$ is possible. Solutions to the UUIDP are often used as mechanisms for surrogate key generation in distributed databases and key-value stores. In spite of its practical relevance, we know of no prior theoretical work on the UUIDP.
In this paper we initiate the systematic study of the UUIDP. We analyze both existing and novel algorithms for this problem, and evaluate their collision probability using worst-case analysis and competitive analysis, against oblivious and adaptive adversaries. In particular, we present an algorithm that is optimal in the worst case against oblivious adversaries, an algorithm that is at most a logarithmic factor away from optimal in the worst case against adaptive adversaries, and an algorithm that is optimal in the competitive sense against both oblivious and adaptive adversaries.
△ Less
Submitted 14 April, 2023;
originally announced April 2023.
-
Approximate Membership Query Filters with a False Positive Free Set
Authors:
Pedro Reviriego,
Alfonso Sánchez-Macián,
Stefan Walzer,
Peter C. Dillinger
Abstract:
In the last decade, significant efforts have been made to reduce the false positive rate of approximate membership checking structures. This has led to the development of new structures such as cuckoo filters and xor filters. Adaptive filters that can react to false positives as they occur to avoid them for future queries to the same elements have also been recently developed. In this paper, we pr…
▽ More
In the last decade, significant efforts have been made to reduce the false positive rate of approximate membership checking structures. This has led to the development of new structures such as cuckoo filters and xor filters. Adaptive filters that can react to false positives as they occur to avoid them for future queries to the same elements have also been recently developed. In this paper, we propose a new type of static filters that completely avoid false positives for a given set of negative elements and show how they can be efficiently implemented using xor probing filters. Several constructions of these filters with a false positive free set are proposed that minimize the memory and speed overheads introduced by avoiding false positives. The proposed filters have been extensively evaluated to validate their functionality and show that in many cases both the memory and speed overheads are negligible. We also discuss several use cases to illustrate the potential benefits of the proposed filters in practical applications.
△ Less
Submitted 12 November, 2021;
originally announced November 2021.
-
Fast Succinct Retrieval and Approximate Membership using Ribbon
Authors:
Peter C. Dillinger,
Lorenz Hübschle-Schneider,
Peter Sanders,
Stefan Walzer
Abstract:
A retrieval data structure for a static function $f:S\rightarrow \{0,1\}^r$ supports queries that return $f(x)$ for any $x \in S$. Retrieval data structures can be used to implement a static approximate membership query data structure (AMQ), i.e., a Bloom filter alternative, with false positive rate $2^{-r}$. The information-theoretic lower bound for both tasks is $r|S|$ bits. While succinct theor…
▽ More
A retrieval data structure for a static function $f:S\rightarrow \{0,1\}^r$ supports queries that return $f(x)$ for any $x \in S$. Retrieval data structures can be used to implement a static approximate membership query data structure (AMQ), i.e., a Bloom filter alternative, with false positive rate $2^{-r}$. The information-theoretic lower bound for both tasks is $r|S|$ bits. While succinct theoretical constructions using $(1+o(1))r|S|$ bits were known, these could not achieve very small overheads in practice because they have an unfavorable space--time tradeoff hidden in the asymptotic costs or because small overheads would only be reached for physically impossible input sizes. With bumped ribbon retrieval (BuRR), we present the first practical succinct retrieval data structure. In an extensive experimental evaluation BuRR achieves space overheads well below 1\,\% while being faster than most previously used retrieval data structures (typically with space overheads at least an order of magnitude larger) and faster than classical Bloom filters (with space overhead $\geq 44\,\%$). This efficiency, including favorable constants, stems from a combination of simplicity, word parallelism, and high locality. We additionally describe homogeneous ribbon filter AMQs, which are even simpler and faster at the price of slightly larger space overhead.
△ Less
Submitted 5 February, 2022; v1 submitted 4 September, 2021;
originally announced September 2021.
-
Ribbon filter: practically smaller than Bloom and Xor
Authors:
Peter C. Dillinger,
Stefan Walzer
Abstract:
Filter data structures over-approximate a set of hashable keys, i.e. set membership queries may incorrectly come out positive. A filter with false positive rate $f \in (0,1]$ is known to require $\ge \log_2(1/f)$ bits per key. At least for larger $f \ge 2^{-4}$, existing practical filters require a space overhead of at least 20% with respect to this information-theoretic bound.
We introduce the…
▽ More
Filter data structures over-approximate a set of hashable keys, i.e. set membership queries may incorrectly come out positive. A filter with false positive rate $f \in (0,1]$ is known to require $\ge \log_2(1/f)$ bits per key. At least for larger $f \ge 2^{-4}$, existing practical filters require a space overhead of at least 20% with respect to this information-theoretic bound.
We introduce the Ribbon filter: a new filter for static sets with a broad range of configurable space overheads and false positive rates with competitive speed over that range, especially for larger $f \ge 2^{-7}$. In many cases, Ribbon is faster than existing filters for the same space overhead, or can achieve space overhead below 10% with some additional CPU time. An experimental Ribbon design with load balancing can even achieve space overheads below 1%.
A Ribbon filter resembles an Xor filter modified to maximize locality and is constructed by solving a band-like linear system over Boolean variables. In previous work, Dietzfelbinger and Walzer describe this linear system and an efficient Gaussian solver. We present and analyze a faster, more adaptable solving process we call "Rapid Incremental Boolean Banding ON the fly," which resembles hash table construction. We also present and analyze an attractive Ribbon variant based on making the linear system homogeneous, and describe several more practical enhancements.
△ Less
Submitted 8 March, 2021; v1 submitted 3 March, 2021;
originally announced March 2021.
-
Data Definitions in the ACL2 Sedan
Authors:
Harsh Raju Chamarthi,
Peter C. Dillinger,
Panagiotis Manolios
Abstract:
We present a data definition framework that enables the convenient specification of data types in ACL2s, the ACL2 Sedan. Our primary motivation for developing the data definition framework was pedagogical. We were teaching undergraduate students how to reason about programs using ACL2s and wanted to provide them with an effective method for defining, testing, and reasoning about data types in the…
▽ More
We present a data definition framework that enables the convenient specification of data types in ACL2s, the ACL2 Sedan. Our primary motivation for developing the data definition framework was pedagogical. We were teaching undergraduate students how to reason about programs using ACL2s and wanted to provide them with an effective method for defining, testing, and reasoning about data types in the context of an untyped theorem prover. Our framework is now routinely used not only for pedagogical purposes, but also by advanced users.
Our framework concisely supports common data definition patterns, e.g. list types, map types, and record types. It also provides support for polymorphic functions. A distinguishing feature of our approach is that we maintain both a predicative and an enumerative characterization of data definitions.
In this paper we present our data definition framework via a sequence of examples. We give a complete characterization in terms of tau rules of the inclusion/exclusion relations a data definition induces, under suitable restrictions. The data definition framework is a key component of counterexample generation support in ACL2s, but can be independently used in ACL2, and is available as a community book.
△ Less
Submitted 5 June, 2014;
originally announced June 2014.
-
Integrating Testing and Interactive Theorem Proving
Authors:
Harsh Raju Chamarthi,
Peter C. Dillinger,
Matt Kaufmann,
Panagiotis Manolios
Abstract:
Using an interactive theorem prover to reason about programs involves a sequence of interactions where the user challenges the theorem prover with conjectures. Invariably, many of the conjectures posed are in fact false, and users often spend considerable effort examining the theorem prover's output before realizing this. We present a synergistic integration of testing with theorem proving, implem…
▽ More
Using an interactive theorem prover to reason about programs involves a sequence of interactions where the user challenges the theorem prover with conjectures. Invariably, many of the conjectures posed are in fact false, and users often spend considerable effort examining the theorem prover's output before realizing this. We present a synergistic integration of testing with theorem proving, implemented in the ACL2 Sedan (ACL2s), for automatically generating concrete counterexamples. Our method uses the full power of the theorem prover and associated libraries to simplify conjectures; this simplification can transform conjectures for which finding counterexamples is hard into conjectures where finding counterexamples is trivial. In fact, our approach even leads to better theorem proving, e.g. if testing shows that a generalization step leads to a false conjecture, we force the theorem prover to backtrack, allowing it to pursue more fruitful options that may yield a proof. The focus of the paper is on the engineering of a synergistic integration of testing with interactive theorem proving; this includes extending ACL2 with new functionality that we expect to be of general interest. We also discuss our experience in using ACL2s to teach freshman students how to reason about their programs.
△ Less
Submitted 20 October, 2011; v1 submitted 22 May, 2011;
originally announced May 2011.