Crabs <- read.table("Crabs.dat", header = T)
Crabs
crab <int> | y <int> | weight <dbl> | width <dbl> | color <int> | spine <int> |
---|---|---|---|---|---|
1 | 8 | 3.050 | 28.3 | 2 | 3 |
2 | 0 | 1.550 | 22.5 | 3 | 3 |
3 | 9 | 2.300 | 26.0 | 1 | 1 |
4 | 0 | 2.100 | 24.8 | 3 | 3 |
5 | 4 | 2.600 | 26.0 | 3 | 3 |
6 | 0 | 2.100 | 23.8 | 2 | 3 |
7 | 0 | 2.350 | 26.5 | 1 | 1 |
8 | 0 | 1.900 | 24.7 | 3 | 2 |
9 | 0 | 1.950 | 23.7 | 2 | 1 |
10 | 0 | 2.150 | 25.6 | 3 | 3 |
First, we fit with a Poisson model. The deviance residuals seems to be higher than 1, indicating over-dispersion. Not very clear whether the mean-variance relationship is linear or quardratic.
fit.pois <- glm(y ~ weight, family = poisson, data = Crabs)
summary(fit.pois)
##
## Call:
## glm(formula = y ~ weight, family = poisson, data = Crabs)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.9307 -1.9981 -0.5627 0.9298 4.9992
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.42841 0.17893 -2.394 0.0167 *
## weight 0.58930 0.06502 9.064 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 632.79 on 172 degrees of freedom
## Residual deviance: 560.87 on 171 degrees of freedom
## AIC: 920.16
##
## Number of Fisher Scoring iterations: 5
plot(predict(fit.pois), residuals(fit.pois)^2, xlab = "fitted", ylab = "residual^2")
abline(a = 0, b= 1, col = "red")
First, we fit with a Poisson model. The deviance residuals seems to be higher than 1, indicating over-dispersion. Under the proportional variance model: Var(yi)=ϕv⋆(μi)=ϕμi we can estimate the level of overdispersion as ˆϕ=∑i(yi−ˆμi)2v⋆(ˆμi)/(n−p)
## empirical estimate of dispersion parameter for the proportional model Var(y_i) = \phi v*(\mu_i)
phi <- sum(residuals(fit.pois, type = "pearson")^2)/(nrow(Crabs) - 2)
phi
## [1] 3.133893
We can compare this estimate with the quasi-likelihood esitmate from R. The estimates of β stay the same. The inflation of the standard error of ˆβ should be roughly √ˆϕ≈1.77.
summary(glm(y ~ weight, family = quasi(link = "log", variance = "mu"), data = Crabs))
##
## Call:
## glm(formula = y ~ weight, family = quasi(link = "log", variance = "mu"),
## data = Crabs)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.9307 -1.9981 -0.5627 0.9298 4.9992
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.4284 0.3168 -1.352 0.178
## weight 0.5893 0.1151 5.120 8.17e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasi family taken to be 3.134159)
##
## Null deviance: 632.79 on 172 degrees of freedom
## Residual deviance: 560.87 on 171 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 5
We may also assume the mean variance relationship is quardratic as Var(yi)= ϕμ2i. The estimates of β will be different from the Poisson GLM estimates as estimating equations of β change (though we can still seperate the estimate of β from ϕ because of the proportional mean-variance relationship assumption).
summary(glm(y ~ weight, family = quasi(link = "log", variance = "mu^2"), data = Crabs))
##
## Call:
## glm(formula = y ~ weight, family = quasi(link = "log", variance = "mu^2"),
## data = Crabs)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.2186 -0.2832 0.0000 0.5603 2.4883
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.0122 0.3863 -2.621 0.00957 **
## weight 0.8184 0.1542 5.306 3.44e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasi family taken to be 1.362496)
##
## Null deviance: 80.907 on 172 degrees of freedom
## Residual deviance: 89.585 on 171 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 19
library(MASS)
# Compare with Negative binomial model
summary(glm.nb(y ~ weight, data = Crabs))
##
## Call:
## glm.nb(formula = y ~ weight, data = Crabs, init.theta = 0.9310592338,
## link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8394 -1.4122 -0.3247 0.4744 2.1279
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.8647 0.4048 -2.136 0.0327 *
## weight 0.7603 0.1578 4.817 1.45e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(0.9311) family taken to be 1)
##
## Null deviance: 216.43 on 172 degrees of freedom
## Residual deviance: 196.16 on 171 degrees of freedom
## AIC: 754.64
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 0.931
## Std. Err.: 0.168
##
## 2 x log-likelihood: -748.644
Such a quasi-likelihood model would not be similar to a Negative-Binomial model as the mean-variance relationship assumption is similar (NB: Var(yi)=μi+ϕμ2i).
Sandwich estimator for the Possion GLM model
library(gee)
obs <- c(1:173) # labeling of observations needed for GEE method
summary(gee(y ~ weight, data = Crabs, id=obs, family=poisson, scale.fix=T))
## Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
## running glm to get initial regression estimate
## (Intercept) weight
## -0.4284053 0.5893041
##
## GEE: GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
## gee S-function, version 4.13 modified 98/01/27 (1998)
##
## Model:
## Link: Logarithm
## Variance to Mean Relation: Poisson
## Correlation Structure: Independent
##
## Call:
## gee(formula = y ~ weight, id = obs, data = Crabs, family = poisson,
## scale.fix = T)
##
## Summary of Residuals:
## Min 1Q Median 3Q Max
## -6.956930 -2.245961 -1.048869 1.801392 11.473098
##
##
## Coefficients:
## Estimate Naive S.E. Naive z Robust S.E. Robust z
## (Intercept) -0.4284053 0.17893528 -2.394192 0.3082884 -1.389625
## weight 0.5893041 0.06501714 9.063827 0.1103203 5.341755
##
## Estimated Scale Parameter: 1
## Number of Iterations: 1
##
## Working Correlation
## [,1]
## [1,] 1
Sandwich estimator for the Quasi-likelihood model with quadratic mean-variance relationship
library(gee)
obs <- c(1:173) # labeling of observations needed for GEE method
summary(gee(y ~ weight, data = Crabs, id=obs, family=quasi(link = "log", variance = "mu^2"), scale.fix=F))
## Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
## running glm to get initial regression estimate
## (Intercept) weight
## -1.0121863 0.8183823
##
## GEE: GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
## gee S-function, version 4.13 modified 98/01/27 (1998)
##
## Model:
## Link: Logarithm
## Variance to Mean Relation: Gamma
## Correlation Structure: Independent
##
## Call:
## gee(formula = y ~ weight, id = obs, data = Crabs, family = quasi(link = "log",
## variance = "mu^2"), scale.fix = F)
##
## Summary of Residuals:
## Min 1Q Median 3Q Max
## -18.6207207 -2.1942696 -0.9860146 1.7085753 11.6128674
##
##
## Coefficients:
## Estimate Naive S.E. Naive z Robust S.E. Robust z
## (Intercept) -1.0121863 0.3862555 -2.620510 0.3721267 -2.720004
## weight 0.8183823 0.1542441 5.305759 0.1332920 6.139772
##
## Estimated Scale Parameter: 1.362496
## Number of Iterations: 1
##
## Working Correlation
## [,1]
## [1,] 1