-
Maximum Agreement Linear Prediction via the Concordance Correlation Coefficient
Authors:
Taeho Kim,
George Luta,
Matteo Bottai,
Pierre Chausse,
Gheorghe Doros,
Edsel A. Pena
Abstract:
This paper examines distributional properties and predictive performance of the estimated maximum agreement linear predictor (MALP) introduced in Bottai, Kim, Lieberman, Luta, and Pena (2022) paper in The American Statistician, which is the linear predictor maximizing Lin's concordance correlation coefficient (CCC) between the predictor and the predictand. It is compared and contrasted, theoretica…
▽ More
This paper examines distributional properties and predictive performance of the estimated maximum agreement linear predictor (MALP) introduced in Bottai, Kim, Lieberman, Luta, and Pena (2022) paper in The American Statistician, which is the linear predictor maximizing Lin's concordance correlation coefficient (CCC) between the predictor and the predictand. It is compared and contrasted, theoretically and through computer experiments, with the estimated least-squares linear predictor (LSLP). Finite-sample and asymptotic properties are obtained, and confidence intervals are also presented. The predictors are illustrated using two real data sets: an eye data set and a bodyfat data set. The results indicate that the estimated MALP is a viable alternative to the estimated LSLP if one desires a predictor whose predicted values possess higher agreement with the predictand values, as measured by the CCC.
△ Less
Submitted 10 February, 2024; v1 submitted 9 April, 2023;
originally announced April 2023.
-
Improved Multiple Confidence Intervals via Thresholding Informed by Prior Information
Authors:
Taeho Kim,
Edsel A. Pena
Abstract:
Consider a statistical problem where a set of parameters are of interest to a researcher. Then multiple confidence intervals can be constructed to infer the set of parameters simultaneously. The constructed multiple confidence intervals are the realization of a multiple interval estimator (MIE), the main focus of this study. In particular, a thresholding approach is introduced to improve the perfo…
▽ More
Consider a statistical problem where a set of parameters are of interest to a researcher. Then multiple confidence intervals can be constructed to infer the set of parameters simultaneously. The constructed multiple confidence intervals are the realization of a multiple interval estimator (MIE), the main focus of this study. In particular, a thresholding approach is introduced to improve the performance of the MIE. The developed thresholds require additional information, so a prior distribution is assumed for this purpose. The MIE procedure is then evaluated by two performance measures: a global coverage probability and a global expected content, which are averages with respect to the prior distribution. The procedure defined by the performance measures will be called a Bayes MIE with thresholding (BMIE Thres). In this study, a normal-normal model is utilized to build up the BMIE Thres for a set of location parameters. Then, the behaviors of BMIE Thres are investigated in terms of the performance measures, which approach those of the corresponding z-based MIE as the thresholding parameter, C, goes to infinity. In addition, an optimization procedure is introduced to achieve the best thresholding parameter C. For illustrations, in-season baseball batting average data and leukemia gene expression data are used to demonstrate the procedure for the known and unknown standard deviations situations, respectively. In the ensuing simulations, the target parameters are generated from different true generating distributions to consider the misspecified prior situation. The simulation also involves Bayes credible MIEs, and the effectiveness among the different MIEs are compared with respect to the performance measures. In general, the thresholding procedure helps to achieve a meaningful reduction in the global expected content while maintaining a nominal level of the global coverage probability.
△ Less
Submitted 8 December, 2019;
originally announced December 2019.
-
Bayesian quantile additive regression trees
Authors:
Bereket P. Kindo,
Hao Wang,
Timothy Hanson,
Edsel A. Peña
Abstract:
Ensemble of regression trees have become popular statistical tools for the estimation of conditional mean given a set of predictors. However, quantile regression trees and their ensembles have not yet garnered much attention despite the increasing popularity of the linear quantile regression model. This work proposes a Bayesian quantile additive regression trees model that shows very good predicti…
▽ More
Ensemble of regression trees have become popular statistical tools for the estimation of conditional mean given a set of predictors. However, quantile regression trees and their ensembles have not yet garnered much attention despite the increasing popularity of the linear quantile regression model. This work proposes a Bayesian quantile additive regression trees model that shows very good predictive performance illustrated using simulation studies and real data applications. Further extension to tackle binary classification problems is also considered.
△ Less
Submitted 9 July, 2016;
originally announced July 2016.
-
MPBART - Multinomial Probit Bayesian Additive Regression Trees
Authors:
Bereket P. Kindo,
Hao Wang,
Edsel A. Peña
Abstract:
This article proposes Multinomial Probit Bayesian Additive Regression Trees (MPBART) as a multinomial probit extension of BART - Bayesian Additive Regression Trees (Chipman et al (2010)). MPBART is flexible to allow inclusion of predictors that describe the observed units as well as the available choice alternatives. Through two simulation studies and four real data examples, we show that MPBART e…
▽ More
This article proposes Multinomial Probit Bayesian Additive Regression Trees (MPBART) as a multinomial probit extension of BART - Bayesian Additive Regression Trees (Chipman et al (2010)). MPBART is flexible to allow inclusion of predictors that describe the observed units as well as the available choice alternatives. Through two simulation studies and four real data examples, we show that MPBART exhibits very good predictive performance in comparison to other discrete choice and multiclass classification methods. To implement MPBART, we have developed an R package mpbart available freely from CRAN repositories.
△ Less
Submitted 6 February, 2016; v1 submitted 30 September, 2013;
originally announced September 2013.
-
Compound p-Value Statistics for Multiple Testing Procedures
Authors:
Joshua D. Habiger,
Edsel A. Pena
Abstract:
Many multiple testing procedures make use of the p-values from the individual pairs of hypothesis tests, and are valid if the p-value statistics are independent and uniformly distributed under the null hypotheses. However, it has recently been shown that these types of multiple testing procedures are inefficient since such p-values do not depend upon all of the available data. This paper provides…
▽ More
Many multiple testing procedures make use of the p-values from the individual pairs of hypothesis tests, and are valid if the p-value statistics are independent and uniformly distributed under the null hypotheses. However, it has recently been shown that these types of multiple testing procedures are inefficient since such p-values do not depend upon all of the available data. This paper provides tools for constructing compound p-value statistics, which are those that depend upon all of the available data, but still satisfy the conditions of independence and uniformity under the null hypotheses. As an example, a class of compound p-value statistics for testing for location shifts is developed. It is demonstrated, both analytically and through simulations, that multiple testing procedures tend to reject more false null hypotheses when applied to these compound p-values rather than the usual p-values, and at the same time still guarantee the desired type I error rate control. The compound p-values, in conjunction with two different multiple testing methods, are used to analyze a real microarray data set. Applying either multiple testing method to the compound p-values, instead of the usual p-values, enhances their powers.
△ Less
Submitted 24 August, 2011;
originally announced August 2011.
-
Dynamic Modeling and Statistical Analysis of Event Times
Authors:
Edsel A. Peña
Abstract:
This review article provides an overview of recent work in the modeling and analysis of recurrent events arising in engineering, reliability, public health, biomedicine and other areas. Recurrent event modeling possesses unique facets making it different and more difficult to handle than single event settings. For instance, the impact of an increasing number of event occurrences needs to be take…
▽ More
This review article provides an overview of recent work in the modeling and analysis of recurrent events arising in engineering, reliability, public health, biomedicine and other areas. Recurrent event modeling possesses unique facets making it different and more difficult to handle than single event settings. For instance, the impact of an increasing number of event occurrences needs to be taken into account, the effects of covariates should be considered, potential association among the interevent times within a unit cannot be ignored, and the effects of performed interventions after each event occurrence need to be factored in. A recent general class of models for recurrent events which simultaneously accommodates these aspects is described. Statistical inference methods for this class of models are presented and illustrated through applications to real data sets. Some existing open research problems are described.
△ Less
Submitted 2 August, 2007;
originally announced August 2007.