1. Horseshoe crab data revisited

Crabs <- read.table("Crabs.dat", header = T)
Crabs

First, we fit with a Poisson model. The deviance residuals seems to be higher than 1, indicating over-dispersion. Not very clear whether the mean-variance relationship is linear or quardratic.

fit.pois <- glm(y ~ weight, family = poisson, data = Crabs)
summary(fit.pois)
## 
## Call:
## glm(formula = y ~ weight, family = poisson, data = Crabs)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.9307  -1.9981  -0.5627   0.9298   4.9992  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.42841    0.17893  -2.394   0.0167 *  
## weight       0.58930    0.06502   9.064   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 632.79  on 172  degrees of freedom
## Residual deviance: 560.87  on 171  degrees of freedom
## AIC: 920.16
## 
## Number of Fisher Scoring iterations: 5
plot(predict(fit.pois), residuals(fit.pois)^2, xlab = "fitted", ylab = "residual^2")
abline(a = 0, b= 1, col = "red")

1.1 Quasi-likelihood models

First, we fit with a Poisson model. The deviance residuals seems to be higher than 1, indicating over-dispersion. Under the proportional variance model: \[Var(y_i) = \phi v^\star(\mu_i) = \phi \mu_i\] we can estimate the level of overdispersion as \[\hat \phi = \frac{\sum_i (y_i - \hat \mu_i)^2}{v^\star(\hat \mu_i)}/ (n-p)\]

## empirical estimate of dispersion parameter for the proportional model Var(y_i) = \phi v*(\mu_i) 
phi <- sum(residuals(fit.pois, type = "pearson")^2)/(nrow(Crabs) - 2)
phi
## [1] 3.133893

We can compare this estimate with the quasi-likelihood esitmate from R. The estimates of \(\beta\) stay the same. The inflation of the standard error of \(\hat \beta\) should be roughly \(\sqrt {\hat \phi}\approx 1.77\).

summary(glm(y ~ weight, family = quasi(link = "log", variance = "mu"), data = Crabs))
## 
## Call:
## glm(formula = y ~ weight, family = quasi(link = "log", variance = "mu"), 
##     data = Crabs)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.9307  -1.9981  -0.5627   0.9298   4.9992  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -0.4284     0.3168  -1.352    0.178    
## weight        0.5893     0.1151   5.120 8.17e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasi family taken to be 3.134159)
## 
##     Null deviance: 632.79  on 172  degrees of freedom
## Residual deviance: 560.87  on 171  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 5

We may also assume the mean variance relationship is quardratic as \[Var(y_i) = \ \phi \mu_i^2.\] The estimates of \(\beta\) will be different from the Poisson GLM estimates as estimating equations of \(\beta\) change (though we can still seperate the estimate of \(\beta\) from \(\phi\) because of the proportional mean-variance relationship assumption).

summary(glm(y ~ weight, family = quasi(link = "log", variance = "mu^2"), data = Crabs))
## 
## Call:
## glm(formula = y ~ weight, family = quasi(link = "log", variance = "mu^2"), 
##     data = Crabs)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2186  -0.2832   0.0000   0.5603   2.4883  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -1.0122     0.3863  -2.621  0.00957 ** 
## weight        0.8184     0.1542   5.306 3.44e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasi family taken to be 1.362496)
## 
##     Null deviance: 80.907  on 172  degrees of freedom
## Residual deviance: 89.585  on 171  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 19
library(MASS)

# Compare with Negative binomial model
summary(glm.nb(y ~ weight, data = Crabs))
## 
## Call:
## glm.nb(formula = y ~ weight, data = Crabs, init.theta = 0.9310592338, 
##     link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8394  -1.4122  -0.3247   0.4744   2.1279  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.8647     0.4048  -2.136   0.0327 *  
## weight        0.7603     0.1578   4.817 1.45e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(0.9311) family taken to be 1)
## 
##     Null deviance: 216.43  on 172  degrees of freedom
## Residual deviance: 196.16  on 171  degrees of freedom
## AIC: 754.64
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.931 
##           Std. Err.:  0.168 
## 
##  2 x log-likelihood:  -748.644

Such a quasi-likelihood model would not be similar to a Negative-Binomial model as the mean-variance relationship assumption is similar (NB: \(Var(y_i) = \mu_i + \phi \mu_i^2\)).

1.2 Sandwich estimator of the variance of \(\hat \beta\)

Sandwich estimator for the Possion GLM model

library(gee) 
obs <- c(1:173) # labeling of observations needed for GEE method
summary(gee(y ~ weight, data = Crabs, id=obs, family=poisson, scale.fix=T))
## Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
## running glm to get initial regression estimate
## (Intercept)      weight 
##  -0.4284053   0.5893041
## 
##  GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
##  gee S-function, version 4.13 modified 98/01/27 (1998) 
## 
## Model:
##  Link:                      Logarithm 
##  Variance to Mean Relation: Poisson 
##  Correlation Structure:     Independent 
## 
## Call:
## gee(formula = y ~ weight, id = obs, data = Crabs, family = poisson, 
##     scale.fix = T)
## 
## Summary of Residuals:
##       Min        1Q    Median        3Q       Max 
## -6.956930 -2.245961 -1.048869  1.801392 11.473098 
## 
## 
## Coefficients:
##               Estimate Naive S.E.   Naive z Robust S.E.  Robust z
## (Intercept) -0.4284053 0.17893528 -2.394192   0.3082884 -1.389625
## weight       0.5893041 0.06501714  9.063827   0.1103203  5.341755
## 
## Estimated Scale Parameter:  1
## Number of Iterations:  1
## 
## Working Correlation
##      [,1]
## [1,]    1

Sandwich estimator for the Quasi-likelihood model with quadratic mean-variance relationship

library(gee) 
obs <- c(1:173) # labeling of observations needed for GEE method
summary(gee(y ~ weight, data = Crabs, id=obs, family=quasi(link = "log", variance = "mu^2"), scale.fix=F))
## Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
## running glm to get initial regression estimate
## (Intercept)      weight 
##  -1.0121863   0.8183823
## 
##  GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
##  gee S-function, version 4.13 modified 98/01/27 (1998) 
## 
## Model:
##  Link:                      Logarithm 
##  Variance to Mean Relation: Gamma 
##  Correlation Structure:     Independent 
## 
## Call:
## gee(formula = y ~ weight, id = obs, data = Crabs, family = quasi(link = "log", 
##     variance = "mu^2"), scale.fix = F)
## 
## Summary of Residuals:
##         Min          1Q      Median          3Q         Max 
## -18.6207207  -2.1942696  -0.9860146   1.7085753  11.6128674 
## 
## 
## Coefficients:
##               Estimate Naive S.E.   Naive z Robust S.E.  Robust z
## (Intercept) -1.0121863  0.3862555 -2.620510   0.3721267 -2.720004
## weight       0.8183823  0.1542441  5.305759   0.1332920  6.139772
## 
## Estimated Scale Parameter:  1.362496
## Number of Iterations:  1
## 
## Working Correlation
##      [,1]
## [1,]    1