-
Dependent Random Partitions by Shrinking Toward an Anchor
Authors:
David B. Dahl,
Richard L. Warr,
Thomas P. Jensen
Abstract:
Although exchangeable processes from Bayesian nonparametrics have been used as a generating mechanism for random partition models, we deviate from this paradigm to explicitly incorporate clustering information in the formulation of our random partition model. Our shrinkage partition distribution takes any partition distribution and shrinks its probability mass toward an anchor partition. We show h…
▽ More
Although exchangeable processes from Bayesian nonparametrics have been used as a generating mechanism for random partition models, we deviate from this paradigm to explicitly incorporate clustering information in the formulation of our random partition model. Our shrinkage partition distribution takes any partition distribution and shrinks its probability mass toward an anchor partition. We show how this provides a framework to model hierarchically-dependent and temporally-dependent random partitions. The shrinkage parameter controls the degree of dependence, accommodating at its extremes both independence and complete equality. Since a priori knowledge of items may vary, our formulation allows the degree of shrinkage toward the anchor to be item-specific. Our random partition model has a tractable normalizing constant which allows for standard Markov chain Monte Carlo algorithms for posterior sampling. We prove intuitive theoretical properties for our distribution and compare it to related partition distributions. We show that our model provides better out-of-sample fit in a real data application.
△ Less
Submitted 24 October, 2024; v1 submitted 29 December, 2023;
originally announced December 2023.
-
Teaching Bayes' Rule using Mosaic Plots
Authors:
Edward D. White,
Richard L. Warr
Abstract:
Students taking statistical courses orientated for business or economics often find the standard presentation of Bayes' Rule challenging. This key concept involves understanding multiple conditional probabilities and how they constitute an unconditional sample space. Many textbooks try to aid the comprehension of Bayes' Rule by illustrating these probabilities with tree diagrams. In our opinion, t…
▽ More
Students taking statistical courses orientated for business or economics often find the standard presentation of Bayes' Rule challenging. This key concept involves understanding multiple conditional probabilities and how they constitute an unconditional sample space. Many textbooks try to aid the comprehension of Bayes' Rule by illustrating these probabilities with tree diagrams. In our opinion, these diagrams fall short in fully assisting the students to visualize Bayes' Rule. In this article, we demonstrate a graphical approach that we have successfully used in the classroom, but is neglected in introductory texts. This approach uses mosaic plots to show the weighting of the conditional probabilities and greatly aids the student in understanding the sample space and its associated probabilities.
△ Less
Submitted 30 November, 2021;
originally announced December 2021.
-
Bootstrapping Through Discrete Convolutional Methods
Authors:
Jared M. Clark,
Richard L. Warr
Abstract:
Bootstrapping was designed to randomly resample data from a fixed sample using Monte Carlo techniques. However, the original sample itself defines a discrete distribution. Convolutional methods are well suited for discrete distributions, and we show the advantages of utilizing these techniques for bootstrapping. The discrete convolutional approach can provide exact numerical solutions for bootstra…
▽ More
Bootstrapping was designed to randomly resample data from a fixed sample using Monte Carlo techniques. However, the original sample itself defines a discrete distribution. Convolutional methods are well suited for discrete distributions, and we show the advantages of utilizing these techniques for bootstrapping. The discrete convolutional approach can provide exact numerical solutions for bootstrap quantities, or at least mathematical error bounds. In contrast, Monte Carlo bootstrap methods can only provide confidence intervals which converge slowly. Additionally, for some problems the computation time of the convolutional approach can be dramatically less than that of Monte Carlo resampling. This article provides several examples of bootstrapping using the proposed convolutional technique and compares the results to those of the Monte Carlo bootstrap, and to those of the competing saddlepoint method.
△ Less
Submitted 16 July, 2021;
originally announced July 2021.
-
The Attraction Indian Buffet Distribution
Authors:
Richard L. Warr,
David B. Dahl,
Jeremy M. Meyer,
Arthur Lui
Abstract:
We propose the attraction Indian buffet distribution (AIBD), a distribution for binary feature matrices influenced by pairwise similarity information. Binary feature matrices are used in Bayesian models to uncover latent variables (i.e., features) that explain observed data. The Indian buffet process (IBP) is a popular exchangeable prior distribution for latent feature matrices. In the presence of…
▽ More
We propose the attraction Indian buffet distribution (AIBD), a distribution for binary feature matrices influenced by pairwise similarity information. Binary feature matrices are used in Bayesian models to uncover latent variables (i.e., features) that explain observed data. The Indian buffet process (IBP) is a popular exchangeable prior distribution for latent feature matrices. In the presence of additional information, however, the exchangeability assumption is not reasonable or desirable. The AIBD can incorporate pairwise similarity information, yet it preserves many properties of the IBP, including the distribution of the total number of features. Thus, much of the interpretation and intuition that one has for the IBP directly carries over to the AIBD. A temperature parameter controls the degree to which the similarity information affects feature-sharing between observations. Unlike other nonexchangeable distributions for feature allocations, the probability mass function of the AIBD has a tractable normalizing constant, making posterior inference on hyperparameters straight-forward using standard MCMC methods. A novel posterior sampling algorithm is proposed for the IBP and the AIBD. We demonstrate the feasibility of the AIBD as a prior distribution in feature allocation models and compare the performance of competing methods in simulations and an application.
△ Less
Submitted 16 July, 2021; v1 submitted 9 June, 2021;
originally announced June 2021.
-
Exact Confidence Intervals for Linear Combinations of Multinomial Probabilities
Authors:
Katherine A. Batterton,
Christine M. Schubert,
Richard L. Warr
Abstract:
Linear combinations of multinomial probabilities, such as those resulting from contingency tables, are of use when evaluating classification system performance. While large sample inference methods for these combinations exist, small sample methods exist only for regions on the multinomial parameter space instead of the linear combinations. However, in medical classification problems it is common…
▽ More
Linear combinations of multinomial probabilities, such as those resulting from contingency tables, are of use when evaluating classification system performance. While large sample inference methods for these combinations exist, small sample methods exist only for regions on the multinomial parameter space instead of the linear combinations. However, in medical classification problems it is common to have small samples necessitating a small sample confidence interval on linear combinations of multinomial probabilities. Therefore, in this paper we derive an exact confidence interval, through the use of fiducial inference, for linear combinations of multinomial probabilities. Simulation demonstrates the presented interval's adherence to exact coverage. Additionally, an adjustment to the exact interval is provided, giving shorter lengths while still achieving better coverage than large sample methods. Computational efficiencies in estimation of the exact interval are achieved through the application of a fast Fourier transform and combining a numerical solver and stochastic optimizer to find solutions. The exact confidence interval presented in this paper allows for comparisons between diagnostic methods previously unavailable, demonstrated through an example of diagnosing chronic allograph nephropathy in post kidney transplant patients.
△ Less
Submitted 19 April, 2021;
originally announced April 2021.
-
A Note on Using Discretized Simulated Data to Estimate Implicit Likelihoods in Bayesian Analyses
Authors:
M. S. Hamada,
T. L. Graves,
N. W. Hengartner,
D. M. Higdon,
A. V. Huzurbazar,
E. C. Lawrence,
C. D. Linkletter,
C. S. Reese,
D. W. Scott,
R. R. Sitter,
R. L. Warr,
B. J. Williams
Abstract:
This article presents a Bayesian inferential method where the likelihood for a model is unknown but where data can easily be simulated from the model. We discretize simulated (continuous) data to estimate the implicit likelihood in a Bayesian analysis employing a Markov chain Monte Carlo algorithm. Three examples are presented as well as a small study on some of the method's properties.
This article presents a Bayesian inferential method where the likelihood for a model is unknown but where data can easily be simulated from the model. We discretize simulated (continuous) data to estimate the implicit likelihood in a Bayesian analysis employing a Markov chain Monte Carlo algorithm. Three examples are presented as well as a small study on some of the method's properties.
△ Less
Submitted 6 August, 2020;
originally announced August 2020.
-
A Bayesian Nonparametric System Reliability Model which Integrates Multiple Sources of Lifetime Information
Authors:
Richard L. Warr,
Jeremy M. Meyer,
Jackson T. Curtis
Abstract:
We present a Bayesian nonparametric system reliability model which scales well and provides a great deal of flexibility in modeling. The Bayesian approach naturally handles the disparate amounts of component and subsystem data that may exist. However, traditional Bayesian reliability models are quite computationally complex, relying on MCMC techniques. Our approach utilizes the conjugate propertie…
▽ More
We present a Bayesian nonparametric system reliability model which scales well and provides a great deal of flexibility in modeling. The Bayesian approach naturally handles the disparate amounts of component and subsystem data that may exist. However, traditional Bayesian reliability models are quite computationally complex, relying on MCMC techniques. Our approach utilizes the conjugate properties of the beta-Stacy process, which is the fundamental building block of our model. These individual models are linked together using a method of moments estimation approach. This model is computationally fast, allows for right-censored data, and is used for estimating and predicting system reliability.
△ Less
Submitted 21 March, 2022; v1 submitted 13 December, 2014;
originally announced December 2014.
-
Numerical Approximation of Probability Mass Functions Via the Inverse Discrete Fourier Transform
Authors:
Richard L. Warr
Abstract:
First passage distributions of semi-Markov processes are of interest in fields such as reliability, survival analysis, and many others. The problem of finding or computing first passage distributions is, in general, quite challenging. We take the approach of using characteristic functions (or Fourier transforms) and inverting them, to numerically calculate the first passage distribution. Numerical…
▽ More
First passage distributions of semi-Markov processes are of interest in fields such as reliability, survival analysis, and many others. The problem of finding or computing first passage distributions is, in general, quite challenging. We take the approach of using characteristic functions (or Fourier transforms) and inverting them, to numerically calculate the first passage distribution. Numerical inversion of characteristic functions can be numerically unstable for a general probability measure, however, we show for lattice distributions they can be quickly calculated using the inverse discrete Fourier transform. Using the fast Fourier transform algorithm these computations can be extremely fast. In addition to the speed of this approach, we are able to prove a few useful bounds for the numerical inversion error of the characteristic functions. These error bounds rely on the existence of a first or second moment of the distribution, or on an eventual monotonicity condition. We demonstrate these techniques in an example and include R-code.
△ Less
Submitted 28 December, 2012;
originally announced December 2012.
-
A Comprehensive Method for Solving Finite-State Semi-Markov Processes
Authors:
Richard L. Warr,
David H. Collins
Abstract:
Semi-Markov processes (SMPs) provide a rich framework for many real-world problems. However, due to difficulty implementing practical solutions they are rarely used with their full capability. The theory of SMPs is quite mature but was mainly developed at a time when computational resources were not widely available. With the exception of some of the simplest cases, solutions to SMPs are inherentl…
▽ More
Semi-Markov processes (SMPs) provide a rich framework for many real-world problems. However, due to difficulty implementing practical solutions they are rarely used with their full capability. The theory of SMPs is quite mature but was mainly developed at a time when computational resources were not widely available. With the exception of some of the simplest cases, solutions to SMPs are inherently numerical, and SMPs have been underutilized by practitioners because of difficulty implementing the theory in applications. This paper demonstrates the theory and computational methods needed to implement SMP models in practical settings. Methods are illustrated with an application modeling the movement of coronary patients in a hospital. Our aim is to allow practitioners to use richer SMP models without being burdened with the rigorous mathematical theory.
△ Less
Submitted 17 May, 2021; v1 submitted 6 December, 2012;
originally announced December 2012.