Search | arXiv e-print repository

Streaming algorithms for products of probabilities

Authors: Markus Lohrey, Leon Rische, Louisa Seelbach Benkner, Julio Xochitemol

Abstract: We consider streaming algorithms for approximating a product of input probabilities up to multiplicative error of $1-ε$. It is shown that every randomized streaming algorithm for this problem needs space $Ω(\log n + \log b - \log ε) - \mathcal{O}(1)$, where $n$ is length of the input stream and $b$ is the bit length of the input numbers. This matches an upper bound from Alur et al.~up to a constan… ▽ More We consider streaming algorithms for approximating a product of input probabilities up to multiplicative error of $1-ε$. It is shown that every randomized streaming algorithm for this problem needs space $Ω(\log n + \log b - \log ε) - \mathcal{O}(1)$, where $n$ is length of the input stream and $b$ is the bit length of the input numbers. This matches an upper bound from Alur et al.~up to a constant multiplicative factor. Moreover, we consider the threshold problem, where it is asked whether the product of the input probabilities is below a given threshold. It is shown that every randomized streaming algorithm for this problem needs space $Ω(n \cdot b)$. △ Less

Submitted 1 October, 2025; v1 submitted 23 April, 2025; originally announced April 2025.

arXiv:2405.17952 [pdf, ps, other]

Upper Bounds on the Average Height of Random Binary Trees

Authors: Louisa Seelbach Benkner

Abstract: We study the average height of random trees generated by leaf-centric binary tree sources as introduced by Zhang, Yang and Kieffer. A leaf-centric binary tree source induces for every $n \geq 2$ a probability distribution on the set of binary trees with $n$ leaves. Our results generalize a result by Devroye, according to which the average height of a random binary search tree of size $n$ is in… ▽ More We study the average height of random trees generated by leaf-centric binary tree sources as introduced by Zhang, Yang and Kieffer. A leaf-centric binary tree source induces for every $n \geq 2$ a probability distribution on the set of binary trees with $n$ leaves. Our results generalize a result by Devroye, according to which the average height of a random binary search tree of size $n$ is in $\mathcal{O}(\log n)$. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2105.04231 [pdf, ps, other]

Distinct Fringe Subtrees in Random Trees

Authors: Louisa Seelbach Benkner, Stephan Wagner

Abstract: A fringe subtree of a rooted tree is a subtree induced by one of the vertices and all its descendants. We consider the problem of estimating the number of distinct fringe subtrees in two types of random trees: simply generated trees and families of increasing trees (recursive trees, $d$-ary increasing trees and generalized plane-oriented recursive trees). We prove that the order of magnitude of th… ▽ More A fringe subtree of a rooted tree is a subtree induced by one of the vertices and all its descendants. We consider the problem of estimating the number of distinct fringe subtrees in two types of random trees: simply generated trees and families of increasing trees (recursive trees, $d$-ary increasing trees and generalized plane-oriented recursive trees). We prove that the order of magnitude of the number of distinct fringe subtrees (under rather mild assumptions on what `distinct' means) in random trees with $n$ vertices is $n/\sqrt{\log n}$ for simply generated trees and $n/\log n$ for increasing trees. △ Less

Submitted 10 May, 2021; originally announced May 2021.

arXiv:2104.13457 [pdf, other]

doi 10.4230/LIPIcs.ESA.2021.70

Hypersuccinct Trees -- New universal tree source codes for optimal compressed tree data structures and range minima

Authors: J. Ian Munro, Patrick K. Nicholson, Louisa Seelbach Benkner, Sebastian Wild

Abstract: We present a new universal source code for distributions of unlabeled binary and ordinal trees that achieves optimal compression to within lower order terms for all tree sources covered by existing universal codes. At the same time, it supports answering many navigational queries on the compressed representation in constant time on the word-RAM; this is not known to be possible for any existing tr… ▽ More We present a new universal source code for distributions of unlabeled binary and ordinal trees that achieves optimal compression to within lower order terms for all tree sources covered by existing universal codes. At the same time, it supports answering many navigational queries on the compressed representation in constant time on the word-RAM; this is not known to be possible for any existing tree compression method. The resulting data structures, "hypersuccinct trees", hence combine the compression achieved by the best known universal codes with the operation support of the best succinct tree data structures. We apply hypersuccinct trees to obtain a universal compressed data structure for range-minimum queries. It has constant query time and the optimal worst-case space usage of $2n+o(n)$ bits, but the space drops to $1.736n + o(n)$ bits on average for random permutations of $n$ elements, and $2\lg\binom nr + o(n)$ for arrays with $r$ increasing runs, respectively. Both results are optimal; the former answers an open problem of Davoodi et al. (2014) and Golin et al. (2016). Compared to prior work on succinct data structures, we do not have to tailor our data structure to specific applications; hypersuccinct trees automatically adapt to the trees at hand. We show that they simultaneously achieve the optimal space usage to within lower order terms for a wide range of distributions over tree shapes, including: binary search trees (BSTs) generated by insertions in random order / Cartesian trees of random arrays, random fringe-balanced BSTs, binary trees with a given number of binary/unary/leaf nodes, random binary tries generated from memoryless sources, full binary trees, unary paths, as well as uniformly chosen weight-balanced BSTs, AVL trees, and left-leaning red-black trees. △ Less

Submitted 3 September, 2021; v1 submitted 27 April, 2021; originally announced April 2021.

Comments: part of ESA 2021

arXiv:2006.01695 [pdf, ps, other]

A Comparison of Empirical Tree Entropies

Authors: Danny Hucke, Markus Lohrey, Louisa Seelbach Benkner

Abstract: Whereas for strings, higher-order empirical entropy is the standard entropy measure, several different notions of empirical entropy for trees have been proposed in the past, notably label entropy, degree entropy, conditional versions of the latter two, and empirical entropy of trees (here, called label-shape entropy). In this paper, we carry out a systematic comparison of these entropy measures. W… ▽ More Whereas for strings, higher-order empirical entropy is the standard entropy measure, several different notions of empirical entropy for trees have been proposed in the past, notably label entropy, degree entropy, conditional versions of the latter two, and empirical entropy of trees (here, called label-shape entropy). In this paper, we carry out a systematic comparison of these entropy measures. We underpin our theoretical investigations by experimental results with real XML data. △ Less

Submitted 1 June, 2020; originally announced June 2020.

MSC Class: 94A17

arXiv:2003.03323 [pdf, ps, other]

On the Collection of Fringe Subtrees in Random Binary Trees

Authors: Louisa Seelbach Benkner, Stephan Wagner

Abstract: A fringe subtree of a rooted tree is a subtree consisting of one of the nodes and all its descendants. In this paper, we are specifically interested in the number of non-isomorphic trees that appear in the collection of all fringe subtrees of a binary tree. This number is analysed under two different random models: uniformly random binary trees and random binary search trees. In the case of unif… ▽ More A fringe subtree of a rooted tree is a subtree consisting of one of the nodes and all its descendants. In this paper, we are specifically interested in the number of non-isomorphic trees that appear in the collection of all fringe subtrees of a binary tree. This number is analysed under two different random models: uniformly random binary trees and random binary search trees. In the case of uniformly random binary trees, we show that the number of non-isomorphic fringe subtrees lies between $c_1n/\sqrt{\ln n}(1+o(1))$ and $c_2n/\sqrt{\ln n}(1+o(1))$ for two constants $c_1 \approx 1.0591261434$ and $c_2 \approx 1.0761505454$, both in expectation and with high probability, where $n$ denotes the size (number of leaves) of the uniformly random binary tree. A similar result is proven for random binary search trees, but the order of magnitude is $n/\ln n$ in this case. Our proof technique can also be used to strengthen known results on the number of distinct fringe subtrees (distinct in the sense of ordered trees). This quantity is of the same order of magnitude in both cases, but with slightly different constants in the upper and lower bounds. △ Less

Submitted 6 March, 2020; originally announced March 2020.

arXiv:1910.07145 [pdf, other]

Practical Random Access to SLP-Compressed Texts

Authors: Travis Gagie, Tomohiro I, Giovanni Manzini, Gonzalo Navarro, Hiroshi Sakamoto, Louisa Seelbach Benkner, Yoshimasa Takabatake

Abstract: Grammar-based compression is a popular and powerful approach to compressing repetitive texts but until recently its relatively poor time-space trade-offs during real-life construction made it impractical for truly massive datasets such as genomic databases. In a recent paper (SPIRE 2019) we showed how simple pre-processing can dramatically improve those trade-offs, and in this paper we turn our at… ▽ More Grammar-based compression is a popular and powerful approach to compressing repetitive texts but until recently its relatively poor time-space trade-offs during real-life construction made it impractical for truly massive datasets such as genomic databases. In a recent paper (SPIRE 2019) we showed how simple pre-processing can dramatically improve those trade-offs, and in this paper we turn our attention to one of the features that make grammar-based compression so attractive: the possibility of supporting fast random access. This is an essential primitive in many algorithms that process grammar-compressed texts without decompressing them and so many theoretical bounds have been published about it, but experimentation has lagged behind. We give a new encoding of grammars that is about as small as the practical state of the art (Maruyama et al., SPIRE 2013) but with significantly faster queries. △ Less

Submitted 19 July, 2020; v1 submitted 15 October, 2019; originally announced October 2019.

Comments: Accepted to SPIRE 2020

arXiv:1901.03155 [pdf, ps, other]

Entropy Bounds for Grammar-Based Tree Compressors

Authors: Danny Hucke, Markus Lohrey, Louisa Seelbach Benkner

Abstract: The definition of $k^{th}$-order empirical entropy of strings is extended to node labelled binary trees. A suitable binary encoding of tree straight-line programs (that have been used for grammar-based tree compression before) is shown to yield binary tree encodings of size bounded by the $k^{th}$-order empirical entropy plus some lower order terms. This generalizes recent results for grammar-base… ▽ More The definition of $k^{th}$-order empirical entropy of strings is extended to node labelled binary trees. A suitable binary encoding of tree straight-line programs (that have been used for grammar-based tree compression before) is shown to yield binary tree encodings of size bounded by the $k^{th}$-order empirical entropy plus some lower order terms. This generalizes recent results for grammar-based string compression to grammar-based tree compression. △ Less

Submitted 20 May, 2020; v1 submitted 10 January, 2019; originally announced January 2019.

Comments: A short version of this paper appeared in the IEEE Proceedings of ISIT 2019

MSC Class: 68P30

arXiv:1811.02457 [pdf, ps, other]

Tunneling on Wheeler Graphs

Authors: Jarno Alanko, Travis Gagie, Gonzalo Navarro, Louisa Seelbach Benkner

Abstract: The Burrows-Wheeler Transform (BWT) is an important technique both in data compression and in the design of compact indexing data structures. It has been generalized from single strings to collections of strings and some classes of labeled directed graphs, such as tries and de Bruijn graphs. The BWTs of repetitive datasets are often compressible using run-length compression, but recently Baier (CP… ▽ More The Burrows-Wheeler Transform (BWT) is an important technique both in data compression and in the design of compact indexing data structures. It has been generalized from single strings to collections of strings and some classes of labeled directed graphs, such as tries and de Bruijn graphs. The BWTs of repetitive datasets are often compressible using run-length compression, but recently Baier (CPM 2018) described how they could be even further compressed using an idea he called tunneling. In this paper we show that tunneled BWTs can still be used for indexing and extend tunneling to the BWTs of Wheeler graphs, a framework that includes all the generalizations mentioned above. △ Less

Submitted 29 May, 2019; v1 submitted 6 November, 2018; originally announced November 2018.

Comments: 11 Pages, 1 figure. This research has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Actions H2020-MSCA-RISE-2015 BIRDS GA No. 690941

arXiv:1804.10396 [pdf, ps, other]

Average Case Analysis of Leaf-Centric Binary Tree Sources

Authors: Louisa Seelbach Benkner, Markus Lohrey, Stephan Wagner

Abstract: We study the average number of distinct fringe subtrees in random trees generated by leaf-centric binary tree sources as introduced by Zhang, Yang and Kieffer. A leaf-centric binary tree source induces for every $n \geq 2$ a probability distribution on the set of binary trees with $n$ leaves. We generalize a result by Flajolet, Gourdon, Martinez and Devroye, according to which the average number o… ▽ More We study the average number of distinct fringe subtrees in random trees generated by leaf-centric binary tree sources as introduced by Zhang, Yang and Kieffer. A leaf-centric binary tree source induces for every $n \geq 2$ a probability distribution on the set of binary trees with $n$ leaves. We generalize a result by Flajolet, Gourdon, Martinez and Devroye, according to which the average number of distinct fringe subtrees in a random binary search tree of size $n$ is in $Θ(n/\log n)$, as well as a result by Flajolet, Sipala and Steayert, according to which the number of distinct fringe subtrees in a uniformly random binary tree of size $n$ is in $Θ(n/\sqrt{\log n})$. △ Less

Submitted 29 April, 2025; v1 submitted 27 April, 2018; originally announced April 2018.

MSC Class: 68P30

Showing 1–10 of 10 results for author: Benkner, L S