7.5.1 Horseshoe Crab satellites count

Crabs <- read.table("Crabs.dat", header = T)
Crabs

ABCDEFGHIJ0123456789

crab <int>	y <int>	weight <dbl>	width <dbl>	color <int>	spine <int>
1	8	3.050	28.3	2	3
2	0	1.550	22.5	3	3
3	9	2.300	26.0	1	1
4	0	2.100	24.8	3	3
5	4	2.600	26.0	3	3
6	0	2.100	23.8	2	3
7	0	2.350	26.5	1	1
8	0	1.900	24.7	3	2
9	0	1.950	23.7	2	1
10	0	2.150	25.6	3	3

Variables:

: Number of male satellites
spine: spine condition (1, both good; 2, one worn or broken; 3, both worn or broken)
weight: in kg
width: carapace width (cm)
color: (1, medium light; 2, medium; 3, medium dark; 4, dark)

hist(Crabs$y, breaks = c(0:16) - 0.5)

Two modes, possibly we need a zero-inflated GLM model

1. Possible issues using the Poisson model

In Data example 1, we used a Poisson model:

fit.pois <- glm(y ~ weight + factor(color) + factor(spine), data = Crabs, family = poisson())
summary(fit.pois)

## 
## Call:
## glm(formula = y ~ weight + factor(color) + factor(spine), family = poisson(), 
##     data = Crabs)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.0433  -1.8609  -0.5944   0.9193   4.9483  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    -0.04263    0.25354  -0.168   0.8665    
## weight          0.54756    0.07318   7.482 7.31e-14 ***
## factor(color)2 -0.26768    0.16781  -1.595   0.1107    
## factor(color)3 -0.52087    0.19414  -2.683   0.0073 ** 
## factor(color)4 -0.53966    0.22525  -2.396   0.0166 *  
## factor(spine)2 -0.16066    0.21146  -0.760   0.4474    
## factor(spine)3  0.09086    0.11948   0.760   0.4470    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 632.79  on 172  degrees of freedom
## Residual deviance: 549.70  on 166  degrees of freedom
## AIC: 919
## 
## Number of Fisher Scoring iterations: 6

plot(fit.pois, which = 3)

We see some sign of over-dispersion and the fit is quite poor.

2. Using the ZINB model

We can try a ZINB model

library(pscl)

## Classes and Methods for R developed in the
## Political Science Computational Laboratory
## Department of Political Science
## Stanford University
## Simon Jackman
## hurdle and zeroinfl functions by Achim Zeileis

fit.zinb <- zeroinfl(y ~ weight + factor(color) + factor(spine) | weight + factor(color) + factor(spine), dist = "negbin", data = Crabs)
summary(fit.zinb)

## 
## Call:
## zeroinfl(formula = y ~ weight + factor(color) + factor(spine) | weight + 
##     factor(color) + factor(spine), data = Crabs, dist = "negbin")
## 
## Pearson residuals:
##     Min      1Q  Median      3Q     Max 
## -1.4662 -0.7573 -0.2847  0.5384  3.9855 
## 
## Count model coefficients (negbin with log link):
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     1.06668    0.37345   2.856  0.00429 ** 
## weight          0.23167    0.11191   2.070  0.03844 *  
## factor(color)2 -0.12300    0.24963  -0.493  0.62221    
## factor(color)3 -0.28019    0.27637  -1.014  0.31068    
## factor(color)4  0.38110    0.34082   1.118  0.26349    
## factor(spine)2 -0.07365    0.28227  -0.261  0.79415    
## factor(spine)3 -0.13142    0.16620  -0.791  0.42908    
## Log(theta)      1.80490    0.38961   4.633 3.61e-06 ***
## 
## Zero-inflation model coefficients (binomial with logit link):
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      3.0997     1.3893   2.231 0.025669 *  
## weight          -1.6968     0.4399  -3.858 0.000115 ***
## factor(color)2   0.1501     0.8945   0.168 0.866738    
## factor(color)3   0.6331     0.9722   0.651 0.514934    
## factor(color)4   1.9139     1.0441   1.833 0.066795 .  
## factor(spine)2   0.2357     0.7340   0.321 0.748105    
## factor(spine)3  -0.5560     0.5619  -0.989 0.322464    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Theta = 6.0794 
## Number of iterations in BFGS optimization: 25 
## Log-likelihood: -344.6 on 15 Df

The dispersion parameter
We have inflated estimates of the standard errors of the estimated coefficients
The deviance residuals are slightly better

mu.pois <- predict(fit.pois, type = "response")
mu.zinb <- predict(fit.zinb, type = "response")
plot(data.frame(mu.pois, Crabs$y, mu.zinb), xlim = c(0, 13), ylim = c(0, 13))

3. Compare with the NB model

If we ignore the zero-inflation

library(MASS)
fit.nb <- glm.nb(y ~ weight + factor(color) + factor(spine), data = Crabs)
summary(fit.nb)

## 
## Call:
## glm.nb(formula = y ~ weight + factor(color) + factor(spine), 
##     data = Crabs, init.theta = 0.9650308392, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8782  -1.3684  -0.3245   0.4242   2.2309  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    -0.32230    0.56387  -0.572    0.568    
## weight          0.69294    0.16565   4.183 2.88e-05 ***
## factor(color)2 -0.32061    0.37253  -0.861    0.389    
## factor(color)3 -0.59550    0.41590  -1.432    0.152    
## factor(color)4 -0.57850    0.46433  -1.246    0.213    
## factor(spine)2 -0.24107    0.39339  -0.613    0.540    
## factor(spine)3  0.04246    0.24792   0.171    0.864    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(0.965) family taken to be 1)
## 
##     Null deviance: 220.67  on 172  degrees of freedom
## Residual deviance: 196.52  on 166  degrees of freedom
## AIC: 761.32
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.965 
##           Std. Err.:  0.176 
## 
##  2 x log-likelihood:  -745.321

We can compare the two models and check whether the zero-inflation part is useful or not. We can not use the anova function but we can still calculate a likelihood-ratio test p-value

1 - pchisq(2 * as.numeric((logLik(fit.zinb) - logLik(fit.nb))), df = df.residual(fit.nb) - df.residual(fit.zinb))

## [1] 2.785625e-09

which significantly shows that the NB model is not sufficient without considering the zero inflation.

Revisit the horseshoe crab data

7.5.1 Horseshoe Crab satellites count

1. Possible issues using the Poisson model

2. Using the ZINB model

3. Compare with the NB model