7.5.1 Horseshoe Crab satellites count

Crabs <- read.table("Crabs.dat", header = T)
Crabs

Variables:

hist(Crabs$y, breaks = c(0:16) - 0.5)

Two modes, possibly we need a zero-inflated GLM model

1. Possible issues using the Poisson model

In Data example 1, we used a Poisson model:

fit.pois <- glm(y ~ weight + factor(color) + factor(spine), data = Crabs, family = poisson())
summary(fit.pois)
## 
## Call:
## glm(formula = y ~ weight + factor(color) + factor(spine), family = poisson(), 
##     data = Crabs)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.0433  -1.8609  -0.5944   0.9193   4.9483  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    -0.04263    0.25354  -0.168   0.8665    
## weight          0.54756    0.07318   7.482 7.31e-14 ***
## factor(color)2 -0.26768    0.16781  -1.595   0.1107    
## factor(color)3 -0.52087    0.19414  -2.683   0.0073 ** 
## factor(color)4 -0.53966    0.22525  -2.396   0.0166 *  
## factor(spine)2 -0.16066    0.21146  -0.760   0.4474    
## factor(spine)3  0.09086    0.11948   0.760   0.4470    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 632.79  on 172  degrees of freedom
## Residual deviance: 549.70  on 166  degrees of freedom
## AIC: 919
## 
## Number of Fisher Scoring iterations: 6
plot(fit.pois, which = 3)

We see some sign of over-dispersion and the fit is quite poor.

2. Using the ZINB model

We can try a ZINB model

library(pscl)
## Classes and Methods for R developed in the
## Political Science Computational Laboratory
## Department of Political Science
## Stanford University
## Simon Jackman
## hurdle and zeroinfl functions by Achim Zeileis
fit.zinb <- zeroinfl(y ~ weight + factor(color) + factor(spine) | weight + factor(color) + factor(spine), dist = "negbin", data = Crabs)
summary(fit.zinb)
## 
## Call:
## zeroinfl(formula = y ~ weight + factor(color) + factor(spine) | weight + 
##     factor(color) + factor(spine), data = Crabs, dist = "negbin")
## 
## Pearson residuals:
##     Min      1Q  Median      3Q     Max 
## -1.4662 -0.7573 -0.2847  0.5384  3.9855 
## 
## Count model coefficients (negbin with log link):
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     1.06668    0.37345   2.856  0.00429 ** 
## weight          0.23167    0.11191   2.070  0.03844 *  
## factor(color)2 -0.12300    0.24963  -0.493  0.62221    
## factor(color)3 -0.28019    0.27637  -1.014  0.31068    
## factor(color)4  0.38110    0.34082   1.118  0.26349    
## factor(spine)2 -0.07365    0.28227  -0.261  0.79415    
## factor(spine)3 -0.13142    0.16620  -0.791  0.42908    
## Log(theta)      1.80490    0.38961   4.633 3.61e-06 ***
## 
## Zero-inflation model coefficients (binomial with logit link):
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      3.0997     1.3893   2.231 0.025669 *  
## weight          -1.6968     0.4399  -3.858 0.000115 ***
## factor(color)2   0.1501     0.8945   0.168 0.866738    
## factor(color)3   0.6331     0.9722   0.651 0.514934    
## factor(color)4   1.9139     1.0441   1.833 0.066795 .  
## factor(spine)2   0.2357     0.7340   0.321 0.748105    
## factor(spine)3  -0.5560     0.5619  -0.989 0.322464    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Theta = 6.0794 
## Number of iterations in BFGS optimization: 25 
## Log-likelihood: -344.6 on 15 Df
  • The dispersion parameter \(\gamma = 1/\theta = 0.14\)
  • We have inflated estimates of the standard errors of the estimated coefficients
  • The deviance residuals are slightly better
mu.pois <- predict(fit.pois, type = "response")
mu.zinb <- predict(fit.zinb, type = "response")
plot(data.frame(mu.pois, Crabs$y, mu.zinb), xlim = c(0, 13), ylim = c(0, 13))