7.5.1 Horseshoe Crab satellites count

Crabs <- read.table("Crabs.dat", header = T)
Crabs

Variables:

\(y\): Number of male satellites
spine: spine condition (1, both good; 2, one worn or broken; 3, both worn or broken)
weight: in kg
width: carapace width (cm)
color: (1, medium light; 2, medium; 3, medium dark; 4, dark)

hist(Crabs$y, breaks = c(0:16) - 0.5)

Two modes, possibly we need a zero-inflated GLM model

1. Possible issues using the Poisson model

In Data example 1, we used a Poisson model:

fit.pois <- glm(y ~ weight + factor(color) + factor(spine), data = Crabs, family = poisson())
summary(fit.pois)

## 
## Call:
## glm(formula = y ~ weight + factor(color) + factor(spine), family = poisson(), 
##     data = Crabs)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.0433  -1.8609  -0.5944   0.9193   4.9483  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    -0.04263    0.25354  -0.168   0.8665    
## weight          0.54756    0.07318   7.482 7.31e-14 ***
## factor(color)2 -0.26768    0.16781  -1.595   0.1107    
## factor(color)3 -0.52087    0.19414  -2.683   0.0073 ** 
## factor(color)4 -0.53966    0.22525  -2.396   0.0166 *  
## factor(spine)2 -0.16066    0.21146  -0.760   0.4474    
## factor(spine)3  0.09086    0.11948   0.760   0.4470    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 632.79  on 172  degrees of freedom
## Residual deviance: 549.70  on 166  degrees of freedom
## AIC: 919
## 
## Number of Fisher Scoring iterations: 6

plot(fit.pois, which = 3)

We see some sign of over-dispersion and the fit is quite poor.

2. Using the ZINB model

We can try a ZINB model

library(pscl)

## Classes and Methods for R developed in the
## Political Science Computational Laboratory
## Department of Political Science
## Stanford University
## Simon Jackman
## hurdle and zeroinfl functions by Achim Zeileis

fit.zinb <- zeroinfl(y ~ weight + factor(color) + factor(spine) | weight + factor(color) + factor(spine), dist = "negbin", data = Crabs)
summary(fit.zinb)

## 
## Call:
## zeroinfl(formula = y ~ weight + factor(color) + factor(spine) | weight + 
##     factor(color) + factor(spine), data = Crabs, dist = "negbin")
## 
## Pearson residuals:
##     Min      1Q  Median      3Q     Max 
## -1.4662 -0.7573 -0.2847  0.5384  3.9855 
## 
## Count model coefficients (negbin with log link):
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     1.06668    0.37345   2.856  0.00429 ** 
## weight          0.23167    0.11191   2.070  0.03844 *  
## factor(color)2 -0.12300    0.24963  -0.493  0.62221    
## factor(color)3 -0.28019    0.27637  -1.014  0.31068    
## factor(color)4  0.38110    0.34082   1.118  0.26349    
## factor(spine)2 -0.07365    0.28227  -0.261  0.79415    
## factor(spine)3 -0.13142    0.16620  -0.791  0.42908    
## Log(theta)      1.80490    0.38961   4.633 3.61e-06 ***
## 
## Zero-inflation model coefficients (binomial with logit link):
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      3.0997     1.3893   2.231 0.025669 *  
## weight          -1.6968     0.4399  -3.858 0.000115 ***
## factor(color)2   0.1501     0.8945   0.168 0.866738    
## factor(color)3   0.6331     0.9722   0.651 0.514934    
## factor(color)4   1.9139     1.0441   1.833 0.066795 .  
## factor(spine)2   0.2357     0.7340   0.321 0.748105    
## factor(spine)3  -0.5560     0.5619  -0.989 0.322464    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Theta = 6.0794 
## Number of iterations in BFGS optimization: 25 
## Log-likelihood: -344.6 on 15 Df

The dispersion parameter \(\gamma = 1/\theta = 0.14\)
We have inflated estimates of the standard errors of the estimated coefficients
The deviance residuals are slightly better

mu.pois <- predict(fit.pois, type = "response")
mu.zinb <- predict(fit.zinb, type = "response")
plot(data.frame(mu.pois, Crabs$y, mu.zinb), xlim = c(0, 13), ylim = c(0, 13))

3. Compare with the NB model

If we ignore the zero-inflation

library(MASS)
fit.nb <- glm.nb(y ~ weight + factor(color) + factor(spine), data = Crabs)
summary(fit.nb)

## 
## Call:
## glm.nb(formula = y ~ weight + factor(color) + factor(spine), 
##     data = Crabs, init.theta = 0.9650308392, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8782  -1.3684  -0.3245   0.4242   2.2309  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    -0.32230    0.56387  -0.572    0.568    
## weight          0.69294    0.16565   4.183 2.88e-05 ***
## factor(color)2 -0.32061    0.37253  -0.861    0.389    
## factor(color)3 -0.59550    0.41590  -1.432    0.152    
## factor(color)4 -0.57850    0.46433  -1.246    0.213    
## factor(spine)2 -0.24107    0.39339  -0.613    0.540    
## factor(spine)3  0.04246    0.24792   0.171    0.864    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(0.965) family taken to be 1)
## 
##     Null deviance: 220.67  on 172  degrees of freedom
## Residual deviance: 196.52  on 166  degrees of freedom
## AIC: 761.32
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.965 
##           Std. Err.:  0.176 
## 
##  2 x log-likelihood:  -745.321

We can compare the two models and check whether the zero-inflation part is useful or not. We can not use the anova function but we can still calculate a likelihood-ratio test p-value

1 - pchisq(2 * as.numeric((logLik(fit.zinb) - logLik(fit.nb))), df = df.residual(fit.nb) - df.residual(fit.zinb))

## [1] 2.785625e-09

which significantly shows that the NB model is not sufficient without considering the zero inflation.

Revisit the horseshoe crab data

7.5.1 Horseshoe Crab satellites count

1. Possible issues using the Poisson model

2. Using the ZINB model

3. Compare with the NB model