Skip to main content

Showing 1–11 of 11 results for author: Saad, F A

Searching in archive stat. Search in all archives.
.
  1. arXiv:2505.18879  [pdf, ps, other

    cs.DS cs.DM cs.IT math.PR stat.CO

    Efficient Online Random Sampling via Randomness Recycling

    Authors: Thomas L. Draper, Feras A. Saad

    Abstract: ``Randomness recycling'' is a powerful algorithmic technique for reusing a fraction of the random information consumed by a randomized algorithm to reduce its entropy requirements. This article presents a family of efficient randomness recycling algorithms for sampling a sequence $X_1, X_2, X_3, \dots$ of discrete random variables whose joint distribution follows an arbitrary stochastic process. W… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

    Comments: 26 pages, 5 figures, 1 table, 13 algorithms

  2. arXiv:2504.04267  [pdf, other

    cs.DS cs.DM cs.IT math.PR stat.CO

    Efficient Rejection Sampling in the Entropy-Optimal Range

    Authors: Thomas L. Draper, Feras A. Saad

    Abstract: The problem of generating a random variate $X$ from a finite discrete probability distribution $P$ using an entropy source of independent unbiased coin flips is considered. The Knuth and Yao complexity theory of nonuniform random number generation furnishes a family of "entropy-optimal" sampling algorithms that consume between $H(P)$ and $H(P)+2$ coin flips per generated output, where $H$ is the S… ▽ More

    Submitted 5 April, 2025; originally announced April 2025.

    Comments: 37 pages, 11 figures, 4 tables, 4 algorithms

  3. arXiv:2307.09607  [pdf, other

    cs.LG cs.AI stat.ME stat.ML

    Sequential Monte Carlo Learning for Time Series Structure Discovery

    Authors: Feras A. Saad, Brian J. Patton, Matthew D. Hoffman, Rif A. Saurous, Vikash K. Mansinghka

    Abstract: This paper presents a new approach to automatically discovering accurate models of complex time series data. Working within a Bayesian nonparametric prior over a symbolic space of Gaussian process time series models, we present a novel structure learning algorithm that integrates sequential Monte Carlo (SMC) and involutive MCMC for highly effective posterior inference. Our method can be used both… ▽ More

    Submitted 13 July, 2023; originally announced July 2023.

    Comments: 17 pages, 8 figures, 2 tables. Appearing in ICML 2023

    Journal ref: Proceedings of the 40th International Conference on Machine Learning, PMLR 202:29473-29489, 2023

  4. arXiv:2202.12363  [pdf, other

    stat.ML cs.LG stat.CO stat.ME

    Estimators of Entropy and Information via Inference in Probabilistic Models

    Authors: Feras A. Saad, Marco Cusumano-Towner, Vikash K. Mansinghka

    Abstract: Estimating information-theoretic quantities such as entropy and mutual information is central to many problems in statistics and machine learning, but challenging in high dimensions. This paper presents estimators of entropy via inference (EEVI), which deliver upper and lower bounds on many information quantities for arbitrary variables in a probabilistic generative model. These estimators use imp… ▽ More

    Submitted 12 December, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

    Comments: 18 pages, 8 figures. Appearing in AISTATS 2022

    Journal ref: Proceedings of the 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:5604-5621, 2022

  5. arXiv:2108.07208  [pdf, other

    cs.LG cs.AI stat.ME stat.ML

    Hierarchical Infinite Relational Model

    Authors: Feras A. Saad, Vikash K. Mansinghka

    Abstract: This paper describes the hierarchical infinite relational model (HIRM), a new probabilistic generative model for noisy, sparse, and heterogeneous relational data. Given a set of relations defined over a collection of domains, the model first infers multiple non-overlapping clusters of relations using a top-level Chinese restaurant process. Within each cluster of relations, a Dirichlet process mixt… ▽ More

    Submitted 16 August, 2021; originally announced August 2021.

    Comments: 11 pages, 6 figures, 4 tables. Appearing in UAI 2021

    Journal ref: Proceedings of the 37th Conference on Uncertainty in Artificial Intelligence, PMLR 161:1067-1077, 2021

  6. arXiv:2010.03485  [pdf, other

    cs.PL cs.LG cs.SC stat.CO stat.ML

    SPPL: Probabilistic Programming with Fast Exact Symbolic Inference

    Authors: Feras A. Saad, Martin C. Rinard, Vikash K. Mansinghka

    Abstract: We present the Sum-Product Probabilistic Language (SPPL), a new probabilistic programming language that automatically delivers exact solutions to a broad range of probabilistic inference queries. SPPL translates probabilistic programs into sum-product expressions, a new symbolic representation and associated semantic domain that extends standard sum-product networks to support mixed-type distribut… ▽ More

    Submitted 11 June, 2021; v1 submitted 7 October, 2020; originally announced October 2020.

    Journal ref: Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI '21), June 20-25, 2021, Virtual, Canada. ACM, New York, NY, USA

  7. arXiv:2003.03830  [pdf, other

    stat.CO cs.DM cs.DS cs.IT math.PR

    The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete Probability Distributions

    Authors: Feras A. Saad, Cameron E. Freer, Martin C. Rinard, Vikash K. Mansinghka

    Abstract: This paper introduces a new algorithm for the fundamental problem of generating a random integer from a discrete probability distribution using a source of independent and unbiased random coin flips. We prove that this algorithm, which we call the Fast Loaded Dice Roller (FLDR), is highly efficient in both space and time: (i) the size of the sampler is guaranteed to be linear in the number of bits… ▽ More

    Submitted 1 June, 2020; v1 submitted 8 March, 2020; originally announced March 2020.

    Comments: 12 pages, 5 figures, 1 table. Appearing in AISTATS 2020

    Journal ref: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, PMLR 108:1036-1046, 2020

  8. arXiv:2001.04555  [pdf, other

    cs.DS cs.DM cs.IT math.PR stat.CO

    Optimal Approximate Sampling from Discrete Probability Distributions

    Authors: Feras A. Saad, Cameron E. Freer, Martin C. Rinard, Vikash K. Mansinghka

    Abstract: This paper addresses a fundamental problem in random variate generation: given access to a random source that emits a stream of independent fair bits, what is the most accurate and entropy-efficient algorithm for sampling from a discrete probability distribution $(p_1, \dots, p_n)$, where the probabilities of the output distribution $(\hat{p}_1, \dots, \hat{p}_n)$ of the sampling algorithm must be… ▽ More

    Submitted 13 January, 2020; originally announced January 2020.

    Journal ref: Proc. ACM Program. Lang. 4, POPL, Article 36 (January 2020)

  9. arXiv:1907.06249  [pdf, other

    cs.PL cs.AI cs.LG stat.CO

    Bayesian Synthesis of Probabilistic Programs for Automatic Data Modeling

    Authors: Feras A. Saad, Marco F. Cusumano-Towner, Ulrich Schaechtle, Martin C. Rinard, Vikash K. Mansinghka

    Abstract: We present new techniques for automatically constructing probabilistic programs for data analysis, interpretation, and prediction. These techniques work with probabilistic domain-specific data modeling languages that capture key properties of a broad class of data generating processes, using Bayesian inference to synthesize probabilistic programs in these modeling languages given observed data. We… ▽ More

    Submitted 14 July, 2019; originally announced July 2019.

    Journal ref: Proc. ACM Program. Lang. 3, POPL, Article 37 (January 2019)

  10. arXiv:1902.10142  [pdf, other

    math.ST cs.LG stat.ME

    A Family of Exact Goodness-of-Fit Tests for High-Dimensional Discrete Distributions

    Authors: Feras A. Saad, Cameron E. Freer, Nathanael L. Ackerman, Vikash K. Mansinghka

    Abstract: The objective of goodness-of-fit testing is to assess whether a dataset of observations is likely to have been drawn from a candidate probability distribution. This paper presents a rank-based family of goodness-of-fit tests that is specialized to discrete distributions on high-dimensional domains. The test is readily implemented using a simulation-based, linear-time procedure. The testing procedu… ▽ More

    Submitted 26 February, 2019; originally announced February 2019.

    Comments: 20 pages, 6 figures. Appearing in AISTATS 2019

    Journal ref: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, PMLR 89:1640-1649, 2019

  11. arXiv:1710.06900  [pdf, other

    stat.ME cs.LG stat.ML

    Temporally-Reweighted Chinese Restaurant Process Mixtures for Clustering, Imputing, and Forecasting Multivariate Time Series

    Authors: Feras A. Saad, Vikash K. Mansinghka

    Abstract: This article proposes a Bayesian nonparametric method for forecasting, imputation, and clustering in sparsely observed, multivariate time series data. The method is appropriate for jointly modeling hundreds of time series with widely varying, non-stationary dynamics. Given a collection of $N$ time series, the Bayesian model first partitions them into independent clusters using a Chinese restaurant… ▽ More

    Submitted 1 April, 2018; v1 submitted 18 October, 2017; originally announced October 2017.

    Comments: 19 pages, 10 figures, 2 tables. Appearing in AISTATS 2018

    Journal ref: Proceedings of the 21st International Conference on Artificial Intelligence and Statistics, PMLR 84:755-764, 2018