Crabs <- read.table("Crabs.dat", header = T)
Crabs
Variables:
hist(Crabs$y, breaks = c(0:16) - 0.5)
Two modes, possibly we need a zero-inflated GLM model
In Data example 1, we used a Poisson model:
fit.pois <- glm(y ~ weight + factor(color) + factor(spine), data = Crabs, family = poisson())
summary(fit.pois)
##
## Call:
## glm(formula = y ~ weight + factor(color) + factor(spine), family = poisson(),
## data = Crabs)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.0433 -1.8609 -0.5944 0.9193 4.9483
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.04263 0.25354 -0.168 0.8665
## weight 0.54756 0.07318 7.482 7.31e-14 ***
## factor(color)2 -0.26768 0.16781 -1.595 0.1107
## factor(color)3 -0.52087 0.19414 -2.683 0.0073 **
## factor(color)4 -0.53966 0.22525 -2.396 0.0166 *
## factor(spine)2 -0.16066 0.21146 -0.760 0.4474
## factor(spine)3 0.09086 0.11948 0.760 0.4470
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 632.79 on 172 degrees of freedom
## Residual deviance: 549.70 on 166 degrees of freedom
## AIC: 919
##
## Number of Fisher Scoring iterations: 6
plot(fit.pois, which = 3)
We see some sign of over-dispersion and the fit is quite poor.
We can try a ZINB model
library(pscl)
## Classes and Methods for R developed in the
## Political Science Computational Laboratory
## Department of Political Science
## Stanford University
## Simon Jackman
## hurdle and zeroinfl functions by Achim Zeileis
fit.zinb <- zeroinfl(y ~ weight + factor(color) + factor(spine) | weight + factor(color) + factor(spine), dist = "negbin", data = Crabs)
summary(fit.zinb)
##
## Call:
## zeroinfl(formula = y ~ weight + factor(color) + factor(spine) | weight +
## factor(color) + factor(spine), data = Crabs, dist = "negbin")
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -1.4662 -0.7573 -0.2847 0.5384 3.9855
##
## Count model coefficients (negbin with log link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.06668 0.37345 2.856 0.00429 **
## weight 0.23167 0.11191 2.070 0.03844 *
## factor(color)2 -0.12300 0.24963 -0.493 0.62221
## factor(color)3 -0.28019 0.27637 -1.014 0.31068
## factor(color)4 0.38110 0.34082 1.118 0.26349
## factor(spine)2 -0.07365 0.28227 -0.261 0.79415
## factor(spine)3 -0.13142 0.16620 -0.791 0.42908
## Log(theta) 1.80490 0.38961 4.633 3.61e-06 ***
##
## Zero-inflation model coefficients (binomial with logit link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.0997 1.3893 2.231 0.025669 *
## weight -1.6968 0.4399 -3.858 0.000115 ***
## factor(color)2 0.1501 0.8945 0.168 0.866738
## factor(color)3 0.6331 0.9722 0.651 0.514934
## factor(color)4 1.9139 1.0441 1.833 0.066795 .
## factor(spine)2 0.2357 0.7340 0.321 0.748105
## factor(spine)3 -0.5560 0.5619 -0.989 0.322464
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Theta = 6.0794
## Number of iterations in BFGS optimization: 25
## Log-likelihood: -344.6 on 15 Df
mu.pois <- predict(fit.pois, type = "response")
mu.zinb <- predict(fit.zinb, type = "response")
plot(data.frame(mu.pois, Crabs$y, mu.zinb), xlim = c(0, 13), ylim = c(0, 13))