LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

The consequences of checking for zero‐inflation and overdispersion in the analysis of count data

Photo from wikipedia

Count data are ubiquitous in ecology and the Poisson generalized linear model (GLM) is commonly used to model the association between counts and explanatory variables of interest. When fitting this… Click to show full abstract

Count data are ubiquitous in ecology and the Poisson generalized linear model (GLM) is commonly used to model the association between counts and explanatory variables of interest. When fitting this model to the data, one typically proceeds by first confirming that the model assumptions are satisfied. If the residuals appear to be overdispersed or if there is zero‐inflation, key assumptions of the Poison GLM may be violated and researchers will then typically consider alternatives to the Poison GLM. An important question is whether the potential model selection bias introduced by this data‐driven multi‐stage procedure merits concern. Here we conduct a large‐scale simulation study to investigate the potential consequences of model selection bias that can arise in the simple scenario of analysing a sample of potentially overdispersed, potentially zero‐inflated, count data. Specifically, we investigate model selection procedures recently recommended by Blasco‐Moreno et al. (2019) using either a series of score tests or information theoretic criteria to select the best model. We find that, when sample sizes are small, model selection based on preliminary score tests (or information theoretic criteria, e.g. AIC, BIC) can lead to potentially substantial inflation of false positive rates (i.e. type 1 error inflation). When sample sizes are sufficiently large, model selection based on preliminary score tests, is not problematic. Ignoring the possibility of overdispersion and zero‐inflation during data analyses can lead to invalid inference. However, if one does not have sufficient power to test for overdispersion and zero‐inflation, post hoc model selection may also lead to substantial bias. This ‘catch‐22’ suggests that, if sample sizes are small, a healthy skepticism is warranted whenever one rejects the null hypothesis of no association between a given outcome and covariate.

Keywords: model selection; zero inflation; inflation; model; count data

Journal Title: Methods in Ecology and Evolution
Year Published: 2021

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.