-
Efficient Online Random Sampling via Randomness Recycling
Authors:
Thomas L. Draper,
Feras A. Saad
Abstract:
``Randomness recycling'' is a powerful algorithmic technique for reusing a fraction of the random information consumed by a randomized algorithm to reduce its entropy requirements. This article presents a family of efficient randomness recycling algorithms for sampling a sequence $X_1, X_2, X_3, \dots$ of discrete random variables whose joint distribution follows an arbitrary stochastic process. W…
▽ More
``Randomness recycling'' is a powerful algorithmic technique for reusing a fraction of the random information consumed by a randomized algorithm to reduce its entropy requirements. This article presents a family of efficient randomness recycling algorithms for sampling a sequence $X_1, X_2, X_3, \dots$ of discrete random variables whose joint distribution follows an arbitrary stochastic process. We develop randomness recycling strategies to reduce the entropy cost of a variety of prominent sampling algorithms, which include uniform sampling, inverse transform sampling, lookup table sampling, alias sampling, and discrete distribution generating (DDG) tree sampling. Our method achieves an expected amortized entropy cost of $H(X_1,\dots,X_k)/k + \varepsilon$ input random bits per output sample using $O(\log(1/\varepsilon))$ space, which is arbitrarily close to the optimal Shannon entropy rate. The combination of space, time, and entropy properties of our method improve upon the Han and Hoshi interval algorithm and Knuth and Yao entropy-optimal algorithm for sampling a discrete random sequence. An empirical evaluation of the algorithm shows that it achieves state-of-the-art runtime performance on the Fisher-Yates shuffle when using a cryptographically secure pseudorandom number generator. Accompanying the manuscript is a performant random sampling library in the C programming language.
△ Less
Submitted 24 May, 2025;
originally announced May 2025.
-
Efficient Rejection Sampling in the Entropy-Optimal Range
Authors:
Thomas L. Draper,
Feras A. Saad
Abstract:
The problem of generating a random variate $X$ from a finite discrete probability distribution $P$ using an entropy source of independent unbiased coin flips is considered. The Knuth and Yao complexity theory of nonuniform random number generation furnishes a family of "entropy-optimal" sampling algorithms that consume between $H(P)$ and $H(P)+2$ coin flips per generated output, where $H$ is the S…
▽ More
The problem of generating a random variate $X$ from a finite discrete probability distribution $P$ using an entropy source of independent unbiased coin flips is considered. The Knuth and Yao complexity theory of nonuniform random number generation furnishes a family of "entropy-optimal" sampling algorithms that consume between $H(P)$ and $H(P)+2$ coin flips per generated output, where $H$ is the Shannon entropy function. However, the space complexity of entropy-optimal samplers scales exponentially with the number of bits required to encode $P$. This article introduces a family of efficient rejection samplers and characterizes their entropy, space, and time complexity. Within this family is a distinguished sampling algorithm that requires linearithmic space and preprocessing time, and whose expected entropy cost always falls in the entropy-optimal range $[H(P), H(P)+2)$. No previous sampler for discrete probability distributions is known to achieve these characteristics. Numerical experiments demonstrate performance improvements in runtime and entropy of the proposed algorithm compared to the celebrated alias method.
△ Less
Submitted 5 April, 2025;
originally announced April 2025.
-
The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete Probability Distributions
Authors:
Feras A. Saad,
Cameron E. Freer,
Martin C. Rinard,
Vikash K. Mansinghka
Abstract:
This paper introduces a new algorithm for the fundamental problem of generating a random integer from a discrete probability distribution using a source of independent and unbiased random coin flips. We prove that this algorithm, which we call the Fast Loaded Dice Roller (FLDR), is highly efficient in both space and time: (i) the size of the sampler is guaranteed to be linear in the number of bits…
▽ More
This paper introduces a new algorithm for the fundamental problem of generating a random integer from a discrete probability distribution using a source of independent and unbiased random coin flips. We prove that this algorithm, which we call the Fast Loaded Dice Roller (FLDR), is highly efficient in both space and time: (i) the size of the sampler is guaranteed to be linear in the number of bits needed to encode the input distribution; and (ii) the expected number of bits of entropy it consumes per sample is at most 6 bits more than the information-theoretically optimal rate. We present fast implementations of the linear-time preprocessing and near-optimal sampling algorithms using unsigned integer arithmetic. Empirical evaluations on a broad set of probability distributions establish that FLDR is 2x-10x faster in both preprocessing and sampling than multiple baseline algorithms, including the widely-used alias and interval samplers. It also uses up to 10000x less space than the information-theoretically optimal sampler, at the expense of less than 1.5x runtime overhead.
△ Less
Submitted 1 June, 2020; v1 submitted 8 March, 2020;
originally announced March 2020.
-
Optimal Approximate Sampling from Discrete Probability Distributions
Authors:
Feras A. Saad,
Cameron E. Freer,
Martin C. Rinard,
Vikash K. Mansinghka
Abstract:
This paper addresses a fundamental problem in random variate generation: given access to a random source that emits a stream of independent fair bits, what is the most accurate and entropy-efficient algorithm for sampling from a discrete probability distribution $(p_1, \dots, p_n)$, where the probabilities of the output distribution $(\hat{p}_1, \dots, \hat{p}_n)$ of the sampling algorithm must be…
▽ More
This paper addresses a fundamental problem in random variate generation: given access to a random source that emits a stream of independent fair bits, what is the most accurate and entropy-efficient algorithm for sampling from a discrete probability distribution $(p_1, \dots, p_n)$, where the probabilities of the output distribution $(\hat{p}_1, \dots, \hat{p}_n)$ of the sampling algorithm must be specified using at most $k$ bits of precision? We present a theoretical framework for formulating this problem and provide new techniques for finding sampling algorithms that are optimal both statistically (in the sense of sampling accuracy) and information-theoretically (in the sense of entropy consumption). We leverage these results to build a system that, for a broad family of measures of statistical accuracy, delivers a sampling algorithm whose expected entropy usage is minimal among those that induce the same distribution (i.e., is "entropy-optimal") and whose output distribution $(\hat{p}_1, \dots, \hat{p}_n)$ is a closest approximation to the target distribution $(p_1, \dots, p_n)$ among all entropy-optimal sampling algorithms that operate within the specified $k$-bit precision. This optimal approximate sampler is also a closer approximation than any (possibly entropy-suboptimal) sampler that consumes a bounded amount of entropy with the specified precision, a class which includes floating-point implementations of inversion sampling and related methods found in many software libraries. We evaluate the accuracy, entropy consumption, precision requirements, and wall-clock runtime of our optimal approximate sampling algorithms on a broad set of distributions, demonstrating the ways that they are superior to existing approximate samplers and establishing that they often consume significantly fewer resources than are needed by exact samplers.
△ Less
Submitted 13 January, 2020;
originally announced January 2020.
-
A Family of Exact Goodness-of-Fit Tests for High-Dimensional Discrete Distributions
Authors:
Feras A. Saad,
Cameron E. Freer,
Nathanael L. Ackerman,
Vikash K. Mansinghka
Abstract:
The objective of goodness-of-fit testing is to assess whether a dataset of observations is likely to have been drawn from a candidate probability distribution. This paper presents a rank-based family of goodness-of-fit tests that is specialized to discrete distributions on high-dimensional domains. The test is readily implemented using a simulation-based, linear-time procedure. The testing procedu…
▽ More
The objective of goodness-of-fit testing is to assess whether a dataset of observations is likely to have been drawn from a candidate probability distribution. This paper presents a rank-based family of goodness-of-fit tests that is specialized to discrete distributions on high-dimensional domains. The test is readily implemented using a simulation-based, linear-time procedure. The testing procedure can be customized by the practitioner using knowledge of the underlying data domain. Unlike most existing test statistics, the proposed test statistic is distribution-free and its exact (non-asymptotic) sampling distribution is known in closed form. We establish consistency of the test against all alternatives by showing that the test statistic is distributed as a discrete uniform if and only if the samples were drawn from the candidate distribution. We illustrate its efficacy for assessing the sample quality of approximate sampling algorithms over combinatorially large spaces with intractable probabilities, including random partitions in Dirichlet process mixture models and random lattices in Ising models.
△ Less
Submitted 26 February, 2019;
originally announced February 2019.