-
Streaming algorithms for products of probabilities
Authors:
Markus Lohrey,
Leon Rische,
Louisa Seelbach Benkner,
Julio Xochitemol
Abstract:
We consider streaming algorithms for approximating a product of input probabilities up to multiplicative error of $1-ε$. It is shown that every randomized streaming algorithm for this problem needs space $Ω(\log n + \log b - \log ε) - \mathcal{O}(1)$, where $n$ is length of the input stream and $b$ is the bit length of the input numbers. This matches an upper bound from Alur et al.~up to a constan…
▽ More
We consider streaming algorithms for approximating a product of input probabilities up to multiplicative error of $1-ε$. It is shown that every randomized streaming algorithm for this problem needs space $Ω(\log n + \log b - \log ε) - \mathcal{O}(1)$, where $n$ is length of the input stream and $b$ is the bit length of the input numbers. This matches an upper bound from Alur et al.~up to a constant multiplicative factor. Moreover, we consider the threshold problem, where it is asked whether the product of the input probabilities is below a given threshold. It is shown that every randomized streaming algorithm for this problem needs space $Ω(n \cdot b)$.
△ Less
Submitted 1 October, 2025; v1 submitted 23 April, 2025;
originally announced April 2025.
-
Upper Bounds on the Average Height of Random Binary Trees
Authors:
Louisa Seelbach Benkner
Abstract:
We study the average height of random trees generated by leaf-centric binary tree sources as introduced by Zhang, Yang and Kieffer. A leaf-centric binary tree source induces for every $n \geq 2$ a probability distribution on the set of binary trees with $n$ leaves. Our results generalize a result by Devroye, according to which the average height of a random binary search tree of size $n$ is in…
▽ More
We study the average height of random trees generated by leaf-centric binary tree sources as introduced by Zhang, Yang and Kieffer. A leaf-centric binary tree source induces for every $n \geq 2$ a probability distribution on the set of binary trees with $n$ leaves. Our results generalize a result by Devroye, according to which the average height of a random binary search tree of size $n$ is in $\mathcal{O}(\log n)$.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Distinct Fringe Subtrees in Random Trees
Authors:
Louisa Seelbach Benkner,
Stephan Wagner
Abstract:
A fringe subtree of a rooted tree is a subtree induced by one of the vertices and all its descendants. We consider the problem of estimating the number of distinct fringe subtrees in two types of random trees: simply generated trees and families of increasing trees (recursive trees, $d$-ary increasing trees and generalized plane-oriented recursive trees). We prove that the order of magnitude of th…
▽ More
A fringe subtree of a rooted tree is a subtree induced by one of the vertices and all its descendants. We consider the problem of estimating the number of distinct fringe subtrees in two types of random trees: simply generated trees and families of increasing trees (recursive trees, $d$-ary increasing trees and generalized plane-oriented recursive trees). We prove that the order of magnitude of the number of distinct fringe subtrees (under rather mild assumptions on what `distinct' means) in random trees with $n$ vertices is $n/\sqrt{\log n}$ for simply generated trees and $n/\log n$ for increasing trees.
△ Less
Submitted 10 May, 2021;
originally announced May 2021.
-
Hypersuccinct Trees -- New universal tree source codes for optimal compressed tree data structures and range minima
Authors:
J. Ian Munro,
Patrick K. Nicholson,
Louisa Seelbach Benkner,
Sebastian Wild
Abstract:
We present a new universal source code for distributions of unlabeled binary and ordinal trees that achieves optimal compression to within lower order terms for all tree sources covered by existing universal codes. At the same time, it supports answering many navigational queries on the compressed representation in constant time on the word-RAM; this is not known to be possible for any existing tr…
▽ More
We present a new universal source code for distributions of unlabeled binary and ordinal trees that achieves optimal compression to within lower order terms for all tree sources covered by existing universal codes. At the same time, it supports answering many navigational queries on the compressed representation in constant time on the word-RAM; this is not known to be possible for any existing tree compression method. The resulting data structures, "hypersuccinct trees", hence combine the compression achieved by the best known universal codes with the operation support of the best succinct tree data structures. We apply hypersuccinct trees to obtain a universal compressed data structure for range-minimum queries. It has constant query time and the optimal worst-case space usage of $2n+o(n)$ bits, but the space drops to $1.736n + o(n)$ bits on average for random permutations of $n$ elements, and $2\lg\binom nr + o(n)$ for arrays with $r$ increasing runs, respectively. Both results are optimal; the former answers an open problem of Davoodi et al. (2014) and Golin et al. (2016). Compared to prior work on succinct data structures, we do not have to tailor our data structure to specific applications; hypersuccinct trees automatically adapt to the trees at hand. We show that they simultaneously achieve the optimal space usage to within lower order terms for a wide range of distributions over tree shapes, including: binary search trees (BSTs) generated by insertions in random order / Cartesian trees of random arrays, random fringe-balanced BSTs, binary trees with a given number of binary/unary/leaf nodes, random binary tries generated from memoryless sources, full binary trees, unary paths, as well as uniformly chosen weight-balanced BSTs, AVL trees, and left-leaning red-black trees.
△ Less
Submitted 3 September, 2021; v1 submitted 27 April, 2021;
originally announced April 2021.
-
A Comparison of Empirical Tree Entropies
Authors:
Danny Hucke,
Markus Lohrey,
Louisa Seelbach Benkner
Abstract:
Whereas for strings, higher-order empirical entropy is the standard entropy measure, several different notions of empirical entropy for trees have been proposed in the past, notably label entropy, degree entropy, conditional versions of the latter two, and empirical entropy of trees (here, called label-shape entropy). In this paper, we carry out a systematic comparison of these entropy measures. W…
▽ More
Whereas for strings, higher-order empirical entropy is the standard entropy measure, several different notions of empirical entropy for trees have been proposed in the past, notably label entropy, degree entropy, conditional versions of the latter two, and empirical entropy of trees (here, called label-shape entropy). In this paper, we carry out a systematic comparison of these entropy measures. We underpin our theoretical investigations by experimental results with real XML data.
△ Less
Submitted 1 June, 2020;
originally announced June 2020.
-
On the Collection of Fringe Subtrees in Random Binary Trees
Authors:
Louisa Seelbach Benkner,
Stephan Wagner
Abstract:
A fringe subtree of a rooted tree is a subtree consisting of one of the nodes and all its descendants. In this paper, we are specifically interested in the number of non-isomorphic trees that appear in the collection of all fringe subtrees of a binary tree. This number is analysed under two different random models: uniformly random binary trees and random binary search trees.
In the case of unif…
▽ More
A fringe subtree of a rooted tree is a subtree consisting of one of the nodes and all its descendants. In this paper, we are specifically interested in the number of non-isomorphic trees that appear in the collection of all fringe subtrees of a binary tree. This number is analysed under two different random models: uniformly random binary trees and random binary search trees.
In the case of uniformly random binary trees, we show that the number of non-isomorphic fringe subtrees lies between $c_1n/\sqrt{\ln n}(1+o(1))$ and $c_2n/\sqrt{\ln n}(1+o(1))$ for two constants $c_1 \approx 1.0591261434$ and $c_2 \approx 1.0761505454$, both in expectation and with high probability, where $n$ denotes the size (number of leaves) of the uniformly random binary tree. A similar result is proven for random binary search trees, but the order of magnitude is $n/\ln n$ in this case.
Our proof technique can also be used to strengthen known results on the number of distinct fringe subtrees (distinct in the sense of ordered trees). This quantity is of the same order of magnitude in both cases, but with slightly different constants in the upper and lower bounds.
△ Less
Submitted 6 March, 2020;
originally announced March 2020.
-
Practical Random Access to SLP-Compressed Texts
Authors:
Travis Gagie,
Tomohiro I,
Giovanni Manzini,
Gonzalo Navarro,
Hiroshi Sakamoto,
Louisa Seelbach Benkner,
Yoshimasa Takabatake
Abstract:
Grammar-based compression is a popular and powerful approach to compressing repetitive texts but until recently its relatively poor time-space trade-offs during real-life construction made it impractical for truly massive datasets such as genomic databases. In a recent paper (SPIRE 2019) we showed how simple pre-processing can dramatically improve those trade-offs, and in this paper we turn our at…
▽ More
Grammar-based compression is a popular and powerful approach to compressing repetitive texts but until recently its relatively poor time-space trade-offs during real-life construction made it impractical for truly massive datasets such as genomic databases. In a recent paper (SPIRE 2019) we showed how simple pre-processing can dramatically improve those trade-offs, and in this paper we turn our attention to one of the features that make grammar-based compression so attractive: the possibility of supporting fast random access. This is an essential primitive in many algorithms that process grammar-compressed texts without decompressing them and so many theoretical bounds have been published about it, but experimentation has lagged behind. We give a new encoding of grammars that is about as small as the practical state of the art (Maruyama et al., SPIRE 2013) but with significantly faster queries.
△ Less
Submitted 19 July, 2020; v1 submitted 15 October, 2019;
originally announced October 2019.
-
Entropy Bounds for Grammar-Based Tree Compressors
Authors:
Danny Hucke,
Markus Lohrey,
Louisa Seelbach Benkner
Abstract:
The definition of $k^{th}$-order empirical entropy of strings is extended to node labelled binary trees. A suitable binary encoding of tree straight-line programs (that have been used for grammar-based tree compression before) is shown to yield binary tree encodings of size bounded by the $k^{th}$-order empirical entropy plus some lower order terms. This generalizes recent results for grammar-base…
▽ More
The definition of $k^{th}$-order empirical entropy of strings is extended to node labelled binary trees. A suitable binary encoding of tree straight-line programs (that have been used for grammar-based tree compression before) is shown to yield binary tree encodings of size bounded by the $k^{th}$-order empirical entropy plus some lower order terms. This generalizes recent results for grammar-based string compression to grammar-based tree compression.
△ Less
Submitted 20 May, 2020; v1 submitted 10 January, 2019;
originally announced January 2019.
-
Tunneling on Wheeler Graphs
Authors:
Jarno Alanko,
Travis Gagie,
Gonzalo Navarro,
Louisa Seelbach Benkner
Abstract:
The Burrows-Wheeler Transform (BWT) is an important technique both in data compression and in the design of compact indexing data structures. It has been generalized from single strings to collections of strings and some classes of labeled directed graphs, such as tries and de Bruijn graphs. The BWTs of repetitive datasets are often compressible using run-length compression, but recently Baier (CP…
▽ More
The Burrows-Wheeler Transform (BWT) is an important technique both in data compression and in the design of compact indexing data structures. It has been generalized from single strings to collections of strings and some classes of labeled directed graphs, such as tries and de Bruijn graphs. The BWTs of repetitive datasets are often compressible using run-length compression, but recently Baier (CPM 2018) described how they could be even further compressed using an idea he called tunneling. In this paper we show that tunneled BWTs can still be used for indexing and extend tunneling to the BWTs of Wheeler graphs, a framework that includes all the generalizations mentioned above.
△ Less
Submitted 29 May, 2019; v1 submitted 6 November, 2018;
originally announced November 2018.
-
Average Case Analysis of Leaf-Centric Binary Tree Sources
Authors:
Louisa Seelbach Benkner,
Markus Lohrey,
Stephan Wagner
Abstract:
We study the average number of distinct fringe subtrees in random trees generated by leaf-centric binary tree sources as introduced by Zhang, Yang and Kieffer. A leaf-centric binary tree source induces for every $n \geq 2$ a probability distribution on the set of binary trees with $n$ leaves. We generalize a result by Flajolet, Gourdon, Martinez and Devroye, according to which the average number o…
▽ More
We study the average number of distinct fringe subtrees in random trees generated by leaf-centric binary tree sources as introduced by Zhang, Yang and Kieffer. A leaf-centric binary tree source induces for every $n \geq 2$ a probability distribution on the set of binary trees with $n$ leaves. We generalize a result by Flajolet, Gourdon, Martinez and Devroye, according to which the average number of distinct fringe subtrees in a random binary search tree of size $n$ is in $Θ(n/\log n)$, as well as a result by Flajolet, Sipala and Steayert, according to which the number of distinct fringe subtrees in a uniformly random binary tree of size $n$ is in $Θ(n/\sqrt{\log n})$.
△ Less
Submitted 29 April, 2025; v1 submitted 27 April, 2018;
originally announced April 2018.