# 1. Horseshoe crab data revisited

Crabs <- read.table("Crabs.dat", header = T)
Crabs

First, we fit with a Poisson model. The deviance residuals seems to be higher than 1, indicating over-dispersion. Not very clear whether the mean-variance relationship is linear or quardratic.

fit.pois <- glm(y ~ weight, family = poisson, data = Crabs)
summary(fit.pois)
##
## Call:
## glm(formula = y ~ weight, family = poisson, data = Crabs)
##
## Deviance Residuals:
##     Min       1Q   Median       3Q      Max
## -2.9307  -1.9981  -0.5627   0.9298   4.9992
##
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.42841    0.17893  -2.394   0.0167 *
## weight       0.58930    0.06502   9.064   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
##     Null deviance: 632.79  on 172  degrees of freedom
## Residual deviance: 560.87  on 171  degrees of freedom
## AIC: 920.16
##
## Number of Fisher Scoring iterations: 5
plot(predict(fit.pois), residuals(fit.pois)^2, xlab = "fitted", ylab = "residual^2")
abline(a = 0, b= 1, col = "red")

## 1.1 Quasi-likelihood models

First, we fit with a Poisson model. The deviance residuals seems to be higher than 1, indicating over-dispersion. Under the proportional variance model: $Var(y_i) = \phi v^\star(\mu_i) = \phi \mu_i$ we can estimate the level of overdispersion as $\hat \phi = \frac{\sum_i (y_i - \hat \mu_i)^2}{v^\star(\hat \mu_i)}/ (n-p)$

## empirical estimate of dispersion parameter for the proportional model Var(y_i) = \phi v*(\mu_i)
phi <- sum(residuals(fit.pois, type = "pearson")^2)/(nrow(Crabs) - 2)
phi
## [1] 3.133893

We can compare this estimate with the quasi-likelihood esitmate from R. The estimates of $$\beta$$ stay the same. The inflation of the standard error of $$\hat \beta$$ should be roughly $$\sqrt {\hat \phi}\approx 1.77$$.

summary(glm(y ~ weight, family = quasi(link = "log", variance = "mu"), data = Crabs))
##
## Call:
## glm(formula = y ~ weight, family = quasi(link = "log", variance = "mu"),
##     data = Crabs)
##
## Deviance Residuals:
##     Min       1Q   Median       3Q      Max
## -2.9307  -1.9981  -0.5627   0.9298   4.9992
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  -0.4284     0.3168  -1.352    0.178
## weight        0.5893     0.1151   5.120 8.17e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasi family taken to be 3.134159)
##
##     Null deviance: 632.79  on 172  degrees of freedom
## Residual deviance: 560.87  on 171  degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 5

We may also assume the mean variance relationship is quardratic as $Var(y_i) = \ \phi \mu_i^2.$ The estimates of $$\beta$$ will be different from the Poisson GLM estimates as estimating equations of $$\beta$$ change (though we can still seperate the estimate of $$\beta$$ from $$\phi$$ because of the proportional mean-variance relationship assumption).

summary(glm(y ~ weight, family = quasi(link = "log", variance = "mu^2"), data = Crabs))
##
## Call:
## glm(formula = y ~ weight, family = quasi(link = "log", variance = "mu^2"),
##     data = Crabs)
##
## Deviance Residuals:
##     Min       1Q   Median       3Q      Max
## -1.2186  -0.2832   0.0000   0.5603   2.4883
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  -1.0122     0.3863  -2.621  0.00957 **
## weight        0.8184     0.1542   5.306 3.44e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasi family taken to be 1.362496)
##
##     Null deviance: 80.907  on 172  degrees of freedom
## Residual deviance: 89.585  on 171  degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 19
library(MASS)

# Compare with Negative binomial model
summary(glm.nb(y ~ weight, data = Crabs))
##
## Call:
## glm.nb(formula = y ~ weight, data = Crabs, init.theta = 0.9310592338,
##
## Deviance Residuals:
##     Min       1Q   Median       3Q      Max
## -1.8394  -1.4122  -0.3247   0.4744   2.1279
##
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)  -0.8647     0.4048  -2.136   0.0327 *
## weight        0.7603     0.1578   4.817 1.45e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(0.9311) family taken to be 1)
##
##     Null deviance: 216.43  on 172  degrees of freedom
## Residual deviance: 196.16  on 171  degrees of freedom
## AIC: 754.64
##
## Number of Fisher Scoring iterations: 1
##
##
##               Theta:  0.931
##           Std. Err.:  0.168
##
##  2 x log-likelihood:  -748.644

Such a quasi-likelihood model would not be similar to a Negative-Binomial model as the mean-variance relationship assumption is similar (NB: $$Var(y_i) = \mu_i + \phi \mu_i^2$$).

## 1.2 Sandwich estimator of the variance of $$\hat \beta$$

Sandwich estimator for the Possion GLM model

library(gee)
obs <- c(1:173) # labeling of observations needed for GEE method
summary(gee(y ~ weight, data = Crabs, id=obs, family=poisson, scale.fix=T))
## Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
## running glm to get initial regression estimate
## (Intercept)      weight
##  -0.4284053   0.5893041
##
##  GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
##  gee S-function, version 4.13 modified 98/01/27 (1998)
##
## Model:
##  Variance to Mean Relation: Poisson
##  Correlation Structure:     Independent
##
## Call:
## gee(formula = y ~ weight, id = obs, data = Crabs, family = poisson,
##     scale.fix = T)
##
## Summary of Residuals:
##       Min        1Q    Median        3Q       Max
## -6.956930 -2.245961 -1.048869  1.801392 11.473098
##
##
## Coefficients:
##               Estimate Naive S.E.   Naive z Robust S.E.  Robust z
## (Intercept) -0.4284053 0.17893528 -2.394192   0.3082884 -1.389625
## weight       0.5893041 0.06501714  9.063827   0.1103203  5.341755
##
## Estimated Scale Parameter:  1
## Number of Iterations:  1
##
## Working Correlation
##      [,1]
## [1,]    1

Sandwich estimator for the Quasi-likelihood model with quadratic mean-variance relationship

library(gee)
obs <- c(1:173) # labeling of observations needed for GEE method
summary(gee(y ~ weight, data = Crabs, id=obs, family=quasi(link = "log", variance = "mu^2"), scale.fix=F))
## Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
## running glm to get initial regression estimate
## (Intercept)      weight
##  -1.0121863   0.8183823
##
##  GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
##  gee S-function, version 4.13 modified 98/01/27 (1998)
##
## Model:
##  Variance to Mean Relation: Gamma
##  Correlation Structure:     Independent
##
## Call:
## gee(formula = y ~ weight, id = obs, data = Crabs, family = quasi(link = "log",
##     variance = "mu^2"), scale.fix = F)
##
## Summary of Residuals:
##         Min          1Q      Median          3Q         Max
## -18.6207207  -2.1942696  -0.9860146   1.7085753  11.6128674
##
##
## Coefficients:
##               Estimate Naive S.E.   Naive z Robust S.E.  Robust z
## (Intercept) -1.0121863  0.3862555 -2.620510   0.3721267 -2.720004
## weight       0.8183823  0.1542441  5.305759   0.1332920  6.139772
##
## Estimated Scale Parameter:  1.362496
## Number of Iterations:  1
##
## Working Correlation
##      [,1]
## [1,]    1