-
A comparison of the discrimination performance of lasso and maximum likelihood estimation in logistic regression model
Authors:
Gilberto P. Alcântara Junior,
Gustavo H. A. Pereira
Abstract:
Logistic regression is widely used in many areas of knowledge. Several works compare the performance of lasso and maximum likelihood estimation in logistic regression. However, part of these works do not perform simulation studies and the remaining ones do not consider scenarios in which the ratio of the number of covariates to sample size is high. In this work, we compare the discrimination perfo…
▽ More
Logistic regression is widely used in many areas of knowledge. Several works compare the performance of lasso and maximum likelihood estimation in logistic regression. However, part of these works do not perform simulation studies and the remaining ones do not consider scenarios in which the ratio of the number of covariates to sample size is high. In this work, we compare the discrimination performance of lasso and maximum likelihood estimation in logistic regression using simulation studies and applications. Variable selection is done both by lasso and by stepwise when maximum likelihood estimation is used. We consider a wide range of values for the ratio of the number of covariates to sample size. The main conclusion of the work is that lasso has a better discrimination performance than maximum likelihood estimation when the ratio of the number of covariates to sample size is high.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
A class of bootstrap based residuals for compositional data
Authors:
Gustavo H. A. Pereira,
Jianwen Cai
Abstract:
Regression models for compositional data are common in several areas of knowledge. As in other classes of regression models, it is desirable to perform diagnostic analysis in these models using residuals that are approximately standard normally distributed. However, for regression models for compositional data, there has not been any multivariate residual that meets this requirement. In this work,…
▽ More
Regression models for compositional data are common in several areas of knowledge. As in other classes of regression models, it is desirable to perform diagnostic analysis in these models using residuals that are approximately standard normally distributed. However, for regression models for compositional data, there has not been any multivariate residual that meets this requirement. In this work, we introduce a class of asymptotically standard normally distributed residuals for compositional data based on bootstrap. Monte Carlo simulation studies indicate that the distributions of the residuals of this class are well approximated by the standard normal distribution in small samples. An application to simulated data also suggests that one of the residuals of the proposed class is better to identify model misspecification than its competitors. Finally, the usefulness of the best residual of the proposed class is illustrated through an application on sleep stages. The class of residuals proposed here can also be used in other classes of multivariate regression models.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
A residual for outlier identification in zero adjusted regression models
Authors:
Gustavo H. A. Pereira,
Juliana S. Rodrigues,
Manoel Santos Neto,
Denise A. Botter,
Mônica C. Sandoval
Abstract:
Zero adjusted regression models are used to fit variables that are discrete at zero and continuous at some interval of the positive real numbers. Diagnostic analysis in these models is usually performed using the randomized quantile residual, which is useful for checking the overall adequacy of a zero adjusted regression model. However, it may fail to identify some outliers. In this work, we intro…
▽ More
Zero adjusted regression models are used to fit variables that are discrete at zero and continuous at some interval of the positive real numbers. Diagnostic analysis in these models is usually performed using the randomized quantile residual, which is useful for checking the overall adequacy of a zero adjusted regression model. However, it may fail to identify some outliers. In this work, we introduce a residual for outlier identification in zero adjusted regression models. Monte Carlo simulation studies and an application suggest that the residual introduced here has good properties and detects outliers that are not identified by the randomized quantile residual.
△ Less
Submitted 18 December, 2018;
originally announced December 2018.
-
Zero-adjusted Birnbaum-Saunders regression model
Authors:
Vera Tomazella,
Juvêncio S. Nobre,
Gustavo H. A. Pereira,
Manoel Santos-Neto
Abstract:
In this paper we introduce the zero-adjusted Birnbaum-Saunders regression model. This new model generalizes at least seven Birnbaum-Saunders regression models. The idea of this modeling is mixing a degenerate distribution at zero with a Birnbaum-Saunders distribution. Besides the capacity to account for excess zeros, the zero-adjusted Birnbaum-Saunders distribution additionally produces an attract…
▽ More
In this paper we introduce the zero-adjusted Birnbaum-Saunders regression model. This new model generalizes at least seven Birnbaum-Saunders regression models. The idea of this modeling is mixing a degenerate distribution at zero with a Birnbaum-Saunders distribution. Besides the capacity to account for excess zeros, the zero-adjusted Birnbaum-Saunders distribution additionally produces an attractive modeling structure to right-skewed data. In this model, the mean and precision parameter of the Birnbaum-Saunders distribution and the probability of zeros can be related to linear and/or non-linear predictors through link functions. We derive a type of residual to perform diagnostic analysis and a perturbation scheme for identifying those observations that exert unusual influence on the estimation process. Finally, two applications to real data show the potential of the model.
△ Less
Submitted 1 February, 2018;
originally announced February 2018.
-
Adjusted quantile residual for generalized linear models
Authors:
Juliana Scudilio,
Gustavo H. A. Pereira
Abstract:
Generalized linear models are widely used in many areas of knowledge. As in other classes of regression models, it is desirable to perform diagnostic analysis in generalized linear models using residuals that are approximately standard normally distributed. Diagnostic analysis in this class of models are usually performed using the standardized Pearson residual or the standardized deviance residua…
▽ More
Generalized linear models are widely used in many areas of knowledge. As in other classes of regression models, it is desirable to perform diagnostic analysis in generalized linear models using residuals that are approximately standard normally distributed. Diagnostic analysis in this class of models are usually performed using the standardized Pearson residual or the standardized deviance residual. The former has skewed distribution and the latter has negative mean, specially when the variance of the response variable is high. In this work, we introduce the adjusted quantile residual for generalized linear models. Using Monte Carlo simulation techniques and two applications, we compare this residual with the standardized Pearson residual, the standardized deviance residual and two other residuals. Overall, the results suggest that the adjusted quantile residual is a better tool for diagnostic analysis in generalized linear models.
△ Less
Submitted 30 October, 2017;
originally announced October 2017.
-
On quantile residuals in beta regression
Authors:
Gustavo H. A. Pereira
Abstract:
Beta regression is often used to model the relationship between a dependent variable that assumes values on the open interval (0,1) and a set of predictor variables. An important challenge in beta regression is to find residuals whose distribution is well approximated by the standard normal distribution. Two previous works compared residuals in beta regression, but the authors did not include the…
▽ More
Beta regression is often used to model the relationship between a dependent variable that assumes values on the open interval (0,1) and a set of predictor variables. An important challenge in beta regression is to find residuals whose distribution is well approximated by the standard normal distribution. Two previous works compared residuals in beta regression, but the authors did not include the quantile residual. Using Monte Carlo simulation techniques, this paper studies the behavior of certain residuals in beta regression in several scenarios. Overall, the results suggest that the distribution of the quantile residual is better approximated by the standard normal distribution than that of the other residuals in most scenarios. Three applications illustrate the effectiveness of the quantile residual.
△ Less
Submitted 10 April, 2017;
originally announced April 2017.