-
Persistent Cache-oblivious Streaming Indexes
Authors:
Andrew Twigg
Abstract:
In [SPAA2007], Bender et al. define a streaming B-tree (or index) as one that supports updates in amortized $o(1)$ IOs, and present a structure achieving amortized $O((\log N)/B)$ IOs and queries in $O(\log N)$ IOs. We extend their result to the partially-persistent case. For a version $v$, let $N_v$ be the number of keys accessible at $v$ and $N$ be the total number of updates. We give a data str…
▽ More
In [SPAA2007], Bender et al. define a streaming B-tree (or index) as one that supports updates in amortized $o(1)$ IOs, and present a structure achieving amortized $O((\log N)/B)$ IOs and queries in $O(\log N)$ IOs. We extend their result to the partially-persistent case. For a version $v$, let $N_v$ be the number of keys accessible at $v$ and $N$ be the total number of updates. We give a data structure using space $O(N)$, supporting updates to a leaf version $v$ with $O((\log N_{v})/B)$ amortized IOs and answering range queries returning $Z$ elements with $O(\log N_{v} + Z/B)$ IOs on average (where the average is over all queries covering disjoint key ranges at a given version). This is the first persistent `streaming' index we are aware of, i.e. that supports updates in $o(1)$ IOs and supports efficient range queries.
△ Less
Submitted 25 July, 2017;
originally announced July 2017.
-
Locality-preserving allocations Problems and coloured Bin Packing
Authors:
Andrew Twigg,
Eduardo C. Xavier
Abstract:
We study the following problem, introduced by Chung et al. in 2006. We are given, online or offline, a set of coloured items of different sizes, and wish to pack them into bins of equal size so that we use few bins in total (at most $α$ times optimal), and that the items of each colour span few bins (at most $β$ times optimal). We call such allocations $(α, β)$-approximate. As usual in bin packing…
▽ More
We study the following problem, introduced by Chung et al. in 2006. We are given, online or offline, a set of coloured items of different sizes, and wish to pack them into bins of equal size so that we use few bins in total (at most $α$ times optimal), and that the items of each colour span few bins (at most $β$ times optimal). We call such allocations $(α, β)$-approximate. As usual in bin packing problems, we allow additive constants and consider $(α,β)$ as the asymptotic performance ratios. We prove that for $\eps>0$, if we desire small $α$, no scheme can beat $(1+\eps, Ω(1/\eps))$-approximate allocations and similarly as we desire small $β$, no scheme can beat $(1.69103, 1+\eps)$-approximate allocations. We give offline schemes that come very close to achieving these lower bounds. For the online case, we prove that no scheme can even achieve $(O(1),O(1))$-approximate allocations. However, a small restriction on item sizes permits a simple online scheme that computes $(2+\eps, 1.7)$-approximate allocations.
△ Less
Submitted 17 August, 2015;
originally announced August 2015.
-
Stratified B-trees and versioning dictionaries
Authors:
Andy Twigg,
Andrew Byde,
Grzegorz Milos,
Tim Moreton,
John Wilkes,
Tom Wilkie
Abstract:
A classic versioned data structure in storage and computer science is the copy-on-write (CoW) B-tree -- it underlies many of today's file systems and databases, including WAFL, ZFS, Btrfs and more. Unfortunately, it doesn't inherit the B-tree's optimality properties; it has poor space utilization, cannot offer fast updates, and relies on random IO to scale. Yet, nothing better has been developed s…
▽ More
A classic versioned data structure in storage and computer science is the copy-on-write (CoW) B-tree -- it underlies many of today's file systems and databases, including WAFL, ZFS, Btrfs and more. Unfortunately, it doesn't inherit the B-tree's optimality properties; it has poor space utilization, cannot offer fast updates, and relies on random IO to scale. Yet, nothing better has been developed since. We describe the `stratified B-tree', which beats all known semi-external memory versioned B-trees, including the CoW B-tree. In particular, it is the first versioned dictionary to achieve optimal tradeoffs between space, query and update performance.
△ Less
Submitted 30 March, 2011; v1 submitted 22 March, 2011;
originally announced March 2011.
-
Optimal query/update tradeoffs in versioned dictionaries
Authors:
Andrew Byde,
Andy Twigg
Abstract:
External-memory dictionaries are a fundamental data structure in file systems and databases. Versioned (or fully-persistent) dictionaries have an associated version tree where queries can be performed at any version, updates can be performed on leaf versions, and any version can be `cloned' by adding a child. Various query/update tradeoffs are known for unversioned dictionaries, many of them with…
▽ More
External-memory dictionaries are a fundamental data structure in file systems and databases. Versioned (or fully-persistent) dictionaries have an associated version tree where queries can be performed at any version, updates can be performed on leaf versions, and any version can be `cloned' by adding a child. Various query/update tradeoffs are known for unversioned dictionaries, many of them with matching upper and lower bounds. No fully-versioned external-memory dictionaries are known with optimal space/query/update tradeoffs. In particular, no versioned constructions are known that offer updates in $o(1)$ I/Os using O(N) space. We present the first cache-oblivious and cache-aware constructions that achieve a wide range of optimal points on this tradeoff.
△ Less
Submitted 12 April, 2011; v1 submitted 13 March, 2011;
originally announced March 2011.
-
Worst-case time decremental connectivity and k-edge witness
Authors:
Andrew Twigg
Abstract:
We give a simple algorithm for decremental graph connectivity that handles edge deletions in worst-case time $O(k \log n)$ and connectivity queries in $O(\log k)$, where $k$ is the number of edges deleted so far, and uses worst-case space $O(m^2)$. We use this to give an algorithm for $k$-edge witness (``does the removal of a given set of $k$ edges disconnect two vertices $u,v$?'') with worst-ca…
▽ More
We give a simple algorithm for decremental graph connectivity that handles edge deletions in worst-case time $O(k \log n)$ and connectivity queries in $O(\log k)$, where $k$ is the number of edges deleted so far, and uses worst-case space $O(m^2)$. We use this to give an algorithm for $k$-edge witness (``does the removal of a given set of $k$ edges disconnect two vertices $u,v$?'') with worst-case time $O(k^2 \log n)$ and space $O(k^2 n^2)$. For $k = o(\sqrt{n})$ these improve the worst-case $O(\sqrt{n})$ bound for deletion due to Eppstein et al. We also give a decremental connectivity algorithm using $O(n^2 \log n / \log \log n)$ space, whose time complexity depends on the toughness and independence number of the input graph. Finally, we show how to construct a distributed data structure for \kvw by giving a labeling scheme. This is the first data structure for \kvw that can efficiently distributed without just giving each vertex a copy of the whole structure. Its complexity depends on being able to construct a linear layout with good properties.
△ Less
Submitted 30 October, 2008;
originally announced October 2008.
-
Lower bounds for distributed markov chain problems
Authors:
Rahul Sami,
Andy Twigg
Abstract:
We study the worst-case communication complexity of distributed algorithms computing a path problem based on stationary distributions of random walks in a network $G$ with the caveat that $G$ is also the communication network. The problem is a natural generalization of shortest path lengths to expected path lengths, and represents a model used in many practical applications such as pagerank and…
▽ More
We study the worst-case communication complexity of distributed algorithms computing a path problem based on stationary distributions of random walks in a network $G$ with the caveat that $G$ is also the communication network. The problem is a natural generalization of shortest path lengths to expected path lengths, and represents a model used in many practical applications such as pagerank and eigentrust as well as other problems involving Markov chains defined by networks.
For the problem of computing a single stationary probability, we prove an $Ω(n^2 \log n)$ bits lower bound; the trivial centralized algorithm costs $O(n^3)$ bits and no known algorithm beats this. We also prove lower bounds for the related problems of approximately computing the stationary probabilities, computing only the ranking of the nodes, and computing the node with maximal rank. As a corollary, we obtain lower bounds for labelling schemes for the hitting time between two nodes.
△ Less
Submitted 29 October, 2008;
originally announced October 2008.