Crabs <- read.table("Crabs.dat", header = T)
Crabs
First, we fit with a Poisson model. The deviance residuals seems to be higher than 1, indicating over-dispersion. Not very clear whether the mean-variance relationship is linear or quardratic.
fit.pois <- glm(y ~ weight, family = poisson, data = Crabs)
summary(fit.pois)
##
## Call:
## glm(formula = y ~ weight, family = poisson, data = Crabs)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.9307 -1.9981 -0.5627 0.9298 4.9992
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.42841 0.17893 -2.394 0.0167 *
## weight 0.58930 0.06502 9.064 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 632.79 on 172 degrees of freedom
## Residual deviance: 560.87 on 171 degrees of freedom
## AIC: 920.16
##
## Number of Fisher Scoring iterations: 5
plot(predict(fit.pois), residuals(fit.pois)^2, xlab = "fitted", ylab = "residual^2")
abline(a = 0, b= 1, col = "red")
First, we fit with a Poisson model. The deviance residuals seems to be higher than 1, indicating over-dispersion. Under the proportional variance model: \[Var(y_i) = \phi v^\star(\mu_i) = \phi \mu_i\] we can estimate the level of overdispersion as \[\hat \phi = \frac{\sum_i (y_i - \hat \mu_i)^2}{v^\star(\hat \mu_i)}/ (n-p)\]
## empirical estimate of dispersion parameter for the proportional model Var(y_i) = \phi v*(\mu_i)
phi <- sum(residuals(fit.pois, type = "pearson")^2)/(nrow(Crabs) - 2)
phi
## [1] 3.133893
We can compare this estimate with the quasi-likelihood esitmate from R. The estimates of \(\beta\) stay the same. The inflation of the standard error of \(\hat \beta\) should be roughly \(\sqrt {\hat \phi}\approx 1.77\).
summary(glm(y ~ weight, family = quasi(link = "log", variance = "mu"), data = Crabs))
##
## Call:
## glm(formula = y ~ weight, family = quasi(link = "log", variance = "mu"),
## data = Crabs)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.9307 -1.9981 -0.5627 0.9298 4.9992
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.4284 0.3168 -1.352 0.178
## weight 0.5893 0.1151 5.120 8.17e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasi family taken to be 3.134159)
##
## Null deviance: 632.79 on 172 degrees of freedom
## Residual deviance: 560.87 on 171 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 5
We may also assume the mean variance relationship is quardratic as \[Var(y_i) = \ \phi \mu_i^2.\] The estimates of \(\beta\) will be different from the Poisson GLM estimates as estimating equations of \(\beta\) change (though we can still seperate the estimate of \(\beta\) from \(\phi\) because of the proportional mean-variance relationship assumption).
summary(glm(y ~ weight, family = quasi(link = "log", variance = "mu^2"), data = Crabs))
##
## Call:
## glm(formula = y ~ weight, family = quasi(link = "log", variance = "mu^2"),
## data = Crabs)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.2186 -0.2832 0.0000 0.5603 2.4883
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.0122 0.3863 -2.621 0.00957 **
## weight 0.8184 0.1542 5.306 3.44e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasi family taken to be 1.362496)
##
## Null deviance: 80.907 on 172 degrees of freedom
## Residual deviance: 89.585 on 171 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 19
library(MASS)
# Compare with Negative binomial model
summary(glm.nb(y ~ weight, data = Crabs))
##
## Call:
## glm.nb(formula = y ~ weight, data = Crabs, init.theta = 0.9310592338,
## link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8394 -1.4122 -0.3247 0.4744 2.1279
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.8647 0.4048 -2.136 0.0327 *
## weight 0.7603 0.1578 4.817 1.45e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(0.9311) family taken to be 1)
##
## Null deviance: 216.43 on 172 degrees of freedom
## Residual deviance: 196.16 on 171 degrees of freedom
## AIC: 754.64
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 0.931
## Std. Err.: 0.168
##
## 2 x log-likelihood: -748.644
Such a quasi-likelihood model would not be similar to a Negative-Binomial model as the mean-variance relationship assumption is similar (NB: \(Var(y_i) = \mu_i + \phi \mu_i^2\)).
Sandwich estimator for the Possion GLM model
library(gee)
obs <- c(1:173) # labeling of observations needed for GEE method
summary(gee(y ~ weight, data = Crabs, id=obs, family=poisson, scale.fix=T))
## Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
## running glm to get initial regression estimate
## (Intercept) weight
## -0.4284053 0.5893041
##
## GEE: GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
## gee S-function, version 4.13 modified 98/01/27 (1998)
##
## Model:
## Link: Logarithm
## Variance to Mean Relation: Poisson
## Correlation Structure: Independent
##
## Call:
## gee(formula = y ~ weight, id = obs, data = Crabs, family = poisson,
## scale.fix = T)
##
## Summary of Residuals:
## Min 1Q Median 3Q Max
## -6.956930 -2.245961 -1.048869 1.801392 11.473098
##
##
## Coefficients:
## Estimate Naive S.E. Naive z Robust S.E. Robust z
## (Intercept) -0.4284053 0.17893528 -2.394192 0.3082884 -1.389625
## weight 0.5893041 0.06501714 9.063827 0.1103203 5.341755
##
## Estimated Scale Parameter: 1
## Number of Iterations: 1
##
## Working Correlation
## [,1]
## [1,] 1
Sandwich estimator for the Quasi-likelihood model with quadratic mean-variance relationship
library(gee)
obs <- c(1:173) # labeling of observations needed for GEE method
summary(gee(y ~ weight, data = Crabs, id=obs, family=quasi(link = "log", variance = "mu^2"), scale.fix=F))
## Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
## running glm to get initial regression estimate
## (Intercept) weight
## -1.0121863 0.8183823
##
## GEE: GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
## gee S-function, version 4.13 modified 98/01/27 (1998)
##
## Model:
## Link: Logarithm
## Variance to Mean Relation: Gamma
## Correlation Structure: Independent
##
## Call:
## gee(formula = y ~ weight, id = obs, data = Crabs, family = quasi(link = "log",
## variance = "mu^2"), scale.fix = F)
##
## Summary of Residuals:
## Min 1Q Median 3Q Max
## -18.6207207 -2.1942696 -0.9860146 1.7085753 11.6128674
##
##
## Coefficients:
## Estimate Naive S.E. Naive z Robust S.E. Robust z
## (Intercept) -1.0121863 0.3862555 -2.620510 0.3721267 -2.720004
## weight 0.8183823 0.1542441 5.305759 0.1332920 6.139772
##
## Estimated Scale Parameter: 1.362496
## Number of Iterations: 1
##
## Working Correlation
## [,1]
## [1,] 1