Skip to main content

Showing 1–23 of 23 results for author: Beraha, M

Searching in archive stat. Search in all archives.
.
  1. arXiv:2506.09437  [pdf, ps, other

    stat.ME math.PR math.ST

    Sufficient digits and density estimation: A Bayesian nonparametric approach using generalized finite Pólya trees

    Authors: Mario Beraha, Jesper Møller

    Abstract: This paper proposes a novel approach for statistical modelling of a continuous random variable $X$ on $[0, 1)$, based on its digit representation $X=.X_1X_2\ldots$. In general, $X$ can be coupled with a random variable $N$ so that if a prior of $N$ is imposed, $(X_1,\ldots,X_N)$ becomes a sufficient statistics and $.X_{N+1}X_{N+2}\ldots$ is uniformly distributed. In line with this fact, and focusi… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  2. arXiv:2505.19643  [pdf, ps, other

    stat.AP

    Online activity prediction via generalized Indian buffet process models

    Authors: Mario Beraha, Lorenzo Masoero, Stefano Favaro, Thomas S. Richardson

    Abstract: Online A/B experiments generate millions of user-activity records each day, yet experimenters need timely forecasts to guide roll-outs and safeguard user experience. Motivated by the problem of activity prediction for A/B tests at Amazon, we introduce a Bayesian nonparametric model for predicting both first-time and repeat triggers in web experiments. The model is based on the stable beta-scaled p… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: This paper supersedes the two technical reports by the same authors arXiv:2401.14722 and arXiv:2402.03231

  3. arXiv:2502.10257  [pdf, other

    math.ST stat.ME

    Bayesian calculus and predictive characterizations of extended feature allocation models

    Authors: Mario Beraha, Federico Camerlenghi, Lorenzo Ghilotti

    Abstract: We introduce and study a unified Bayesian framework for extended feature allocations which flexibly captures interactions -- such as repulsion or attraction -- among features and their associated weights. We provide a complete Bayesian analysis of the proposed model and specialize our general theory to noteworthy classes of priors. This includes a novel prior based on determinantal point processes… ▽ More

    Submitted 3 March, 2025; v1 submitted 14 February, 2025; originally announced February 2025.

  4. arXiv:2402.03231  [pdf, other

    stat.ME cs.LG stat.AP

    Improved prediction of future user activity in online A/B testing

    Authors: Lorenzo Masoero, Mario Beraha, Thomas Richardson, Stefano Favaro

    Abstract: In online randomized experiments or A/B tests, accurate predictions of participant inclusion rates are of paramount importance. These predictions not only guide experimenters in optimizing the experiment's duration but also enhance the precision of treatment effect estimates. In this paper we present a novel, straightforward, and scalable Bayesian nonparametric approach for predicting the rate at… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  5. arXiv:2401.14722  [pdf, other

    stat.ME cs.LG stat.AP stat.ML

    A Nonparametric Bayes Approach to Online Activity Prediction

    Authors: Mario Beraha, Lorenzo Masoero, Stefano Favaro, Thomas S. Richardson

    Abstract: Accurately predicting the onset of specific activities within defined timeframes holds significant importance in several applied contexts. In particular, accurate prediction of the number of future users that will be exposed to an intervention is an important piece of information for experimenters running online experiments (A/B tests). In this work, we propose a novel approach to predict the numb… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

  6. arXiv:2312.13992  [pdf, other

    stat.ME

    Bayesian nonparametric boundary detection for income areal data

    Authors: Matteo Gianella, Mario Beraha, Alessandra Guglielmi

    Abstract: Recent discussions on the future of metropolitan cities underscore the pivotal role of (social) equity, driven by demographic and economic trends. More equal policies can foster and contribute to a city's economic success and social stability. In this work, we focus on identifying metropolitan areas with distinct economic and social levels in the greater Los Angeles area, one of the most diverse y… ▽ More

    Submitted 29 January, 2025; v1 submitted 21 December, 2023; originally announced December 2023.

  7. arXiv:2310.09818  [pdf, other

    stat.CO stat.ME

    MCMC for Bayesian nonparametric mixture modeling under differential privacy

    Authors: Mario Beraha, Stefano Favaro, Vinayak Rao

    Abstract: Estimating the probability density of a population while preserving the privacy of individuals in that population is an important and challenging problem that has received considerable attention in recent years. While the previous literature focused on frequentist approaches, in this paper, we propose a Bayesian nonparametric mixture model under differential privacy (DP) and present two Markov cha… ▽ More

    Submitted 21 May, 2024; v1 submitted 15 October, 2023; originally announced October 2023.

  8. arXiv:2309.15408  [pdf, other

    stat.ME cs.DS cs.IR math.ST

    A smoothed-Bayesian approach to frequency recovery from sketched data

    Authors: Mario Beraha, Stefano Favaro, Matteo Sesia

    Abstract: We provide a novel statistical perspective on a classical problem at the intersection of computer science and information theory: recovering the empirical frequency of a symbol in a large discrete dataset using only a compressed representation, or sketch, obtained via random hashing. Departing from traditional algorithmic approaches, recent works have proposed Bayesian nonparametric (BNP) methods… ▽ More

    Submitted 10 April, 2025; v1 submitted 27 September, 2023; originally announced September 2023.

  9. arXiv:2304.02402  [pdf, other

    stat.ME math.PR math.ST

    Wasserstein Principal Component Analysis for Circular Measures

    Authors: Mario Beraha, Matteo Pegoraro

    Abstract: We consider the 2-Wasserstein space of probability measures supported on the unit-circle, and propose a framework for Principal Component Analysis (PCA) for data living in such a space. We build on a detailed investigation of the optimal transportation problem for measures on the unit-circle which might be of independent interest. In particular, we derive an expression for optimal transport maps i… ▽ More

    Submitted 5 April, 2023; originally announced April 2023.

  10. arXiv:2303.17844  [pdf, other

    stat.ME

    Transform-scaled process priors for trait allocations in Bayesian nonparametrics

    Authors: Mario Beraha, Stefano Favaro

    Abstract: Completely random measures (CRMs) provide a broad class of priors, arguably, the most popular, for Bayesian nonparametric (BNP) analysis of trait allocations. As a peculiar property, CRM priors lead to predictive distributions that share the following common structure: for fixed prior's parameters, a new data point exhibits a Poisson (random) number of ``new'' traits, i.e., not appearing in the sa… ▽ More

    Submitted 31 March, 2023; originally announced March 2023.

  11. arXiv:2303.15029  [pdf, other

    math.ST stat.ME

    Random measure priors in Bayesian recovery from sketches

    Authors: Mario Beraha, Stefano Favaro, Matteo Sesia

    Abstract: This paper introduces a Bayesian nonparametric approach to frequency recovery from lossy-compressed discrete data, leveraging all information contained in a sketch obtained through random hashing. By modeling the data points as random samples from an unknown discrete distribution endowed with a Poisson-Kingman prior, we derive the posterior distribution of a symbol's empirical frequency given the… ▽ More

    Submitted 4 June, 2024; v1 submitted 27 March, 2023; originally announced March 2023.

  12. arXiv:2303.02438  [pdf, other

    stat.ME

    Bayesian clustering of high-dimensional data via latent repulsive mixtures

    Authors: Lorenzo Ghilotti, Mario Beraha, Alessandra Guglielmi

    Abstract: Model-based clustering of moderate or large dimensional data is notoriously difficult. We propose a model for simultaneous dimensionality reduction and clustering by assuming a mixture model for a set of latent scores, which are then linked to the observations via a Gaussian latent factor model. This approach was recently investigated by Chandra et al. (2023). The authors use a factor-analytic rep… ▽ More

    Submitted 1 June, 2024; v1 submitted 4 March, 2023; originally announced March 2023.

  13. arXiv:2302.09034  [pdf, other

    math.ST math.PR stat.ME

    Bayesian Mixtures Models with Repulsive and Attractive Atoms

    Authors: Mario Beraha, Raffaele Argiento, Federico Camerlenghi, Alessandra Guglielmi

    Abstract: The study of almost surely discrete random probability measures is an active line of research in Bayesian nonparametrics. The idea of assuming interaction across the atoms of the random probability measure has recently spurred significant interest in the context of Bayesian mixture models. This allows the definition of priors that encourage well-separated and interpretable clusters. In this work,… ▽ More

    Submitted 24 April, 2025; v1 submitted 17 February, 2023; originally announced February 2023.

    MSC Class: 60G57; 62G05; 62F15; 62H30

  14. arXiv:2205.15654  [pdf, other

    stat.ME

    Normalized Latent Measure Factor Models

    Authors: Mario Beraha, Jim E. Griffin

    Abstract: We propose a methodology for modeling and comparing probability distributions within a Bayesian nonparametric framework. Building on dependent normalized random measures, we consider a prior distribution for a collection of discrete random measures where each measure is a linear combination of a set of latent measures, interpretable as characteristic traits shared by different distributions, with… ▽ More

    Submitted 31 May, 2022; originally announced May 2022.

  15. arXiv:2205.08144  [pdf, other

    stat.CO stat.OT

    BayesMix: Bayesian Mixture Models in C++

    Authors: Mario Beraha, Bruno Guindani, Matteo Gianella, Alessandra Guglielmi

    Abstract: We describe BayesMix, a C++ library for MCMC posterior simulation for general Bayesian mixture models. The goal of BayesMix is to provide a self-contained ecosystem to perform inference for mixture models to computer scientists, statisticians and practitioners. The key idea of this library is extensibility, as we wish the users to easily adapt our software to their specific Bayesian mixture models… ▽ More

    Submitted 17 May, 2022; originally announced May 2022.

  16. arXiv:2203.12280  [pdf, other

    stat.AP

    Bayesian Nonparametric Vector Autoregressive Models via a Logit Stick-breaking Prior: an Application to Child Obesity

    Authors: Mario Beraha, Alessandra Guglielmi, Fernando A. Quintana, Maria de Iorio, Johan Gunnar Eriksson, Fabian Yap

    Abstract: Overweight and obesity in adults are known to be associated with risks of metabolic and cardiovascular diseases. Because obesity is an epidemic, increasingly affecting children, it is important to understand if this condition persists from early life to childhood and if different patterns of obesity growth can be detected. Our motivation starts from a study of obesity over time in children from So… ▽ More

    Submitted 23 March, 2022; originally announced March 2022.

  17. arXiv:2112.10393  [pdf, other

    stat.CO stat.ME

    Bayesian nonparametric model based clustering with intractable distributions: an ABC approach

    Authors: Mario Beraha, Riccardo Corradin

    Abstract: Bayesian nonparametric mixture models offer a rich framework for model based clustering. We consider the situation where the kernel of the mixture is available only up to an intractable normalizing constant. In this case, most of the commonly used Markov chain Monte Carlo (MCMC) methods are not suitable. We propose an approximate Bayesian computational (ABC) strategy, whereby we approximate the po… ▽ More

    Submitted 20 December, 2021; originally announced December 2021.

    Comments: 20 pages, 4 figures

  18. arXiv:2107.09357  [pdf, other

    stat.CO stat.AP

    JAGS, NIMBLE, Stan: a detailed comparison among Bayesian MCMC software

    Authors: Mario Beraha, Daniele Falco, Alessandra Guglielmi

    Abstract: The aim of this work is the comparison of the performance of the three popular software platforms JAGS, NIMBLE and Stan. These probabilistic programming languages are able to automatically generate samples from the posterior distribution of interest using MCMC algorithms, starting from the specification of a Bayesian model, i.e. the likelihood and the prior. The final goal is to present a detailed… ▽ More

    Submitted 20 July, 2021; originally announced July 2021.

  19. arXiv:2101.09039  [pdf, other

    stat.ME stat.ML

    Projected Statistical Methods for Distributional Data on the Real Line with the Wasserstein Metric

    Authors: Matteo Pegoraro, Mario Beraha

    Abstract: We present a novel class of projected methods, to perform statistical analysis on a data set of probability distributions on the real line, with the 2-Wasserstein metric. We focus in particular on Principal Component Analysis (PCA) and regression. To define these models, we exploit a representation of the Wasserstein space closely related to its weak Riemannian structure, by mapping the data to a… ▽ More

    Submitted 29 November, 2021; v1 submitted 22 January, 2021; originally announced January 2021.

  20. arXiv:2011.06444  [pdf, other

    stat.ME stat.CO

    MCMC computations for Bayesian mixture models using repulsive point processes

    Authors: Mario Beraha, Raffaele Argiento, Jesper Møller, Alessandra Guglielmi

    Abstract: Repulsive mixture models have recently gained popularity for Bayesian cluster detection. Compared to more traditional mixture models, repulsive mixture models produce a smaller number of well separated clusters. The most commonly used methods for posterior inference either require to fix a priori the number of components or are based on reversible jump MCMC computation. We present a general framew… ▽ More

    Submitted 19 April, 2021; v1 submitted 12 November, 2020; originally announced November 2020.

  21. arXiv:2007.14961  [pdf, other

    stat.ME stat.AP

    Spatially dependent mixture models via the Logistic Multivariate CAR prior

    Authors: Mario Beraha, Matteo Pegoraro, Riccardo Peli, Alessandra Guglielmi

    Abstract: We consider the problem of spatially dependent areal data, where for each area independent observations are available, and propose to model the density of each area through a finite mixture of Gaussian distributions. The spatial dependence is introduced via a novel joint distribution for a collection of vectors in the simplex, that we term logisticMCAR. We show that salient features of the logisti… ▽ More

    Submitted 8 June, 2021; v1 submitted 29 July, 2020; originally announced July 2020.

  22. arXiv:2005.10287  [pdf, other

    stat.ME

    The semi-hierarchical Dirichlet Process and its application to clustering homogeneous distributions

    Authors: Mario Beraha, Alessandra Guglielmi, Fernando A. Quintana

    Abstract: Assessing homogeneity of distributions is an old problem that has received considerable attention, especially in the nonparametric Bayesian literature. To this effect, we propose the semi-hierarchical Dirichlet process, a novel hierarchical prior that extends the hierarchical Dirichlet process of Teh et al. (2006) and that avoids the degeneracy issues of nested processes recently described by Came… ▽ More

    Submitted 16 June, 2021; v1 submitted 20 May, 2020; originally announced May 2020.

  23. arXiv:1907.07384  [pdf, other

    cs.LG stat.ML

    Feature Selection via Mutual Information: New Theoretical Insights

    Authors: Mario Beraha, Alberto Maria Metelli, Matteo Papini, Andrea Tirinzoni, Marcello Restelli

    Abstract: Mutual information has been successfully adopted in filter feature-selection methods to assess both the relevancy of a subset of features in predicting the target variable and the redundancy with respect to other variables. However, existing algorithms are mostly heuristic and do not offer any guarantee on the proposed solution. In this paper, we provide novel theoretical results showing that cond… ▽ More

    Submitted 17 July, 2019; originally announced July 2019.

    Comments: Accepted for presentation at the International Joint Conference on Neural Networks (IJCNN) 2019