-
A goodness-of-fit diagnostic for count data derived from half-normal plots with a simulated envelope
Authors:
Darshana Jayakumari,
Jochen Einbeck,
John Hinde,
Julien Mainguy,
Rafael de Andrade Moral
Abstract:
Traditional methods of model diagnostics may include a plethora of graphical techniques based on residual analysis, as well as formal tests (e.g. Shapiro-Wilk test for normality and Bartlett test for homogeneity of variance). In this paper we derive a new distance metric based on the half-normal plot with a simulation envelope, a graphical model evaluation method, and investigate its properties th…
▽ More
Traditional methods of model diagnostics may include a plethora of graphical techniques based on residual analysis, as well as formal tests (e.g. Shapiro-Wilk test for normality and Bartlett test for homogeneity of variance). In this paper we derive a new distance metric based on the half-normal plot with a simulation envelope, a graphical model evaluation method, and investigate its properties through simulation studies. The proposed metric can help to assess the fit of a given model, and also act as a model selection criterion by being comparable across models, whether based or not on a true likelihood. More specifically, it quantitatively encompasses the model evaluation principles and removes the subjective bias when closely related models are involved. We validate the technique by means of an extensive simulation study carried out using count data, and illustrate with two case studies in ecology and fisheries research.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Modelling excess zeros in count data: A new perspective on modelling approaches
Authors:
John Haslett,
Andrew C. Parnell,
John Hinde,
Rafael A. Moral
Abstract:
We consider the analysis of count data in which the observed frequency of zero counts is unusually large, typically with respect to the Poisson distribution. We focus on two alternative modelling approaches: Over-Dispersion (OD) models, and Zero-Inflation (ZI) models, both of which can be seen as generalisations of the Poisson distribution; we refer to these as Implicit and Explicit ZI models, res…
▽ More
We consider the analysis of count data in which the observed frequency of zero counts is unusually large, typically with respect to the Poisson distribution. We focus on two alternative modelling approaches: Over-Dispersion (OD) models, and Zero-Inflation (ZI) models, both of which can be seen as generalisations of the Poisson distribution; we refer to these as Implicit and Explicit ZI models, respectively. Although sometimes seen as competing approaches, they can be complementary; OD is a consequence of ZI modelling, and ZI is a by-product of OD modelling. The central objective in such analyses is often concerned with inference on the effect of covariates on the mean, in light of the apparent excess of zeros in the counts. Typically the modelling of the excess zeros per se is a secondary objective and there are choices to be made between, and within, the OD and ZI approaches. The contribution of this paper is primarily conceptual. We contrast, descriptively, the impact on zeros of the two approaches. We further offer a novel descriptive characterisation of alternative ZI models, including the classic hurdle and mixture models, by providing a unifying theoretical framework for their comparison. This in turn leads to a novel and technically simpler ZI model. We develop the underlying theory for univariate counts and touch on its implication for multivariate count data.
△ Less
Submitted 29 July, 2021; v1 submitted 8 July, 2020;
originally announced July 2020.
-
Reparametrization of COM-Poisson Regression Models with Applications in the Analysis of Experimental Data
Authors:
Eduardo E. Ribeiro Jr,
Walmes M. Zeviani,
Wagner H. Bonat,
Clarice G. B. Demétrio,
John Hinde
Abstract:
In the analysis of count data often the equidispersion assumption is not suitable, hence the Poisson regression model is inappropriate. As a generalization of the Poisson distribution, the COM-Poisson distribution can deal with under-, equi- and overdispersed count data. It is a member of the exponential family of distributions and has well known special cases. In spite of the nice properties of t…
▽ More
In the analysis of count data often the equidispersion assumption is not suitable, hence the Poisson regression model is inappropriate. As a generalization of the Poisson distribution, the COM-Poisson distribution can deal with under-, equi- and overdispersed count data. It is a member of the exponential family of distributions and has well known special cases. In spite of the nice properties of the COM-Poisson distribution, its location parameter does not correspond to the expectation, which complicates the interpretation of regression models. In this paper, we propose a straightforward reparametrization of the COM-Poisson distribution based on an approximation to the expectation of this distribution. The main advantage of our new parametrization is the straightforward interpretation of the regression coefficients in terms of the expectation, as usual in the context of generalized linear models. Furthermore, the estimation and inference for the new COM-Poisson regression model can be done based on the likelihood paradigm. We carried out simulation studies to verify the finite sample properties of the maximum likelihood estimators. The results from our simulation study show that the maximum likelihood estimators are unbiased and consistent for both regression and dispersion parameters. We observed that the empirical correlation between the regression and dispersion parameter estimators is close to zero, which suggests that these parameters are orthogonal. We illustrate the application of the proposed model through the analysis of three data sets with over-, under- and equidispersed count data. The study of distribution properties through a consideration of dispersion, zero-inflated and heavy tail indexes, together with the results of data analysis show the flexibility over standard approaches.
△ Less
Submitted 29 January, 2018;
originally announced January 2018.
-
Extended Poisson-Tweedie: properties and regression models for count data
Authors:
Wagner H. Bonat,
Bent Jørgensen,
Célestin C. Kokonendji,
John Hinde,
Clarice G. B. Demétrio
Abstract:
We propose a new class of discrete generalized linear models based on the class of Poisson-Tweedie factorial dispersion models with variance of the form $μ+ φμ^p$, where $μ$ is the mean, $φ$ and $p$ are the dispersion and Tweedie power parameters, respectively. The models are fitted by using an estimating function approach obtained by combining the quasi-score and Pearson estimating functions for…
▽ More
We propose a new class of discrete generalized linear models based on the class of Poisson-Tweedie factorial dispersion models with variance of the form $μ+ φμ^p$, where $μ$ is the mean, $φ$ and $p$ are the dispersion and Tweedie power parameters, respectively. The models are fitted by using an estimating function approach obtained by combining the quasi-score and Pearson estimating functions for estimation of the regression and dispersion parameters, respectively. This provides a flexible and efficient regression methodology for a comprehensive family of count models including Hermite, Neyman Type A, Pólya-Aeppli, negative binomial and Poisson-inverse Gaussian. The estimating function approach allows us to extend the Poisson-Tweedie distributions to deal with underdispersed count data by allowing negative values for the dispersion parameter $φ$. Furthermore, the Poisson-Tweedie family can automatically adapt to highly skewed count data with excessive zeros, without the need to introduce zero-inflated or hurdle components, by the simple estimation of the power parameter. Thus, the proposed models offer a unified framework to deal with under, equi, overdispersed, zero-inflated and heavy-tailed count data. The computational implementation of the proposed models is fast, relying only on a simple Newton scoring algorithm. Simulation studies showed that the estimating function approach provides unbiased and consistent estimators for both regression and dispersion parameters. We highlight the ability of the Poisson-Tweedie distributions to deal with count data through a consideration of dispersion, zero-inflated and heavy tail indices, and illustrate its application with four data analyses. We provide an \texttt{R} implementation and the data sets as supplementary materials.
△ Less
Submitted 11 September, 2016; v1 submitted 24 August, 2016;
originally announced August 2016.