Loading [MathJax]/jax/output/HTML-CSS/jax.js

1. Horseshoe crab data revisited

Crabs <- read.table("Crabs.dat", header = T)
Crabs
ABCDEFGHIJ0123456789
crab
<int>
y
<int>
weight
<dbl>
width
<dbl>
color
<int>
spine
<int>
183.05028.323
201.55022.533
392.30026.011
402.10024.833
542.60026.033
602.10023.823
702.35026.511
801.90024.732
901.95023.721
1002.15025.633

First, we fit with a Poisson model. The deviance residuals seems to be higher than 1, indicating over-dispersion. Not very clear whether the mean-variance relationship is linear or quardratic.

fit.pois <- glm(y ~ weight, family = poisson, data = Crabs)
summary(fit.pois)
## 
## Call:
## glm(formula = y ~ weight, family = poisson, data = Crabs)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.9307  -1.9981  -0.5627   0.9298   4.9992  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.42841    0.17893  -2.394   0.0167 *  
## weight       0.58930    0.06502   9.064   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 632.79  on 172  degrees of freedom
## Residual deviance: 560.87  on 171  degrees of freedom
## AIC: 920.16
## 
## Number of Fisher Scoring iterations: 5
plot(predict(fit.pois), residuals(fit.pois)^2, xlab = "fitted", ylab = "residual^2")
abline(a = 0, b= 1, col = "red")

1.1 Quasi-likelihood models

First, we fit with a Poisson model. The deviance residuals seems to be higher than 1, indicating over-dispersion. Under the proportional variance model: Var(yi)=ϕv(μi)=ϕμi we can estimate the level of overdispersion as ˆϕ=i(yiˆμi)2v(ˆμi)/(np)

## empirical estimate of dispersion parameter for the proportional model Var(y_i) = \phi v*(\mu_i) 
phi <- sum(residuals(fit.pois, type = "pearson")^2)/(nrow(Crabs) - 2)
phi
## [1] 3.133893

We can compare this estimate with the quasi-likelihood esitmate from R. The estimates of β stay the same. The inflation of the standard error of ˆβ should be roughly ˆϕ1.77.

summary(glm(y ~ weight, family = quasi(link = "log", variance = "mu"), data = Crabs))
## 
## Call:
## glm(formula = y ~ weight, family = quasi(link = "log", variance = "mu"), 
##     data = Crabs)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.9307  -1.9981  -0.5627   0.9298   4.9992  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -0.4284     0.3168  -1.352    0.178    
## weight        0.5893     0.1151   5.120 8.17e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasi family taken to be 3.134159)
## 
##     Null deviance: 632.79  on 172  degrees of freedom
## Residual deviance: 560.87  on 171  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 5

We may also assume the mean variance relationship is quardratic as Var(yi)= ϕμ2i. The estimates of β will be different from the Poisson GLM estimates as estimating equations of β change (though we can still seperate the estimate of β from ϕ because of the proportional mean-variance relationship assumption).

summary(glm(y ~ weight, family = quasi(link = "log", variance = "mu^2"), data = Crabs))
## 
## Call:
## glm(formula = y ~ weight, family = quasi(link = "log", variance = "mu^2"), 
##     data = Crabs)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2186  -0.2832   0.0000   0.5603   2.4883  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -1.0122     0.3863  -2.621  0.00957 ** 
## weight        0.8184     0.1542   5.306 3.44e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasi family taken to be 1.362496)
## 
##     Null deviance: 80.907  on 172  degrees of freedom
## Residual deviance: 89.585  on 171  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 19
library(MASS)

# Compare with Negative binomial model
summary(glm.nb(y ~ weight, data = Crabs))
## 
## Call:
## glm.nb(formula = y ~ weight, data = Crabs, init.theta = 0.9310592338, 
##     link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8394  -1.4122  -0.3247   0.4744   2.1279  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.8647     0.4048  -2.136   0.0327 *  
## weight        0.7603     0.1578   4.817 1.45e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(0.9311) family taken to be 1)
## 
##     Null deviance: 216.43  on 172  degrees of freedom
## Residual deviance: 196.16  on 171  degrees of freedom
## AIC: 754.64
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.931 
##           Std. Err.:  0.168 
## 
##  2 x log-likelihood:  -748.644

Such a quasi-likelihood model would not be similar to a Negative-Binomial model as the mean-variance relationship assumption is similar (NB: Var(yi)=μi+ϕμ2i).

1.2 Sandwich estimator of the variance of ˆβ

Sandwich estimator for the Possion GLM model

library(gee) 
obs <- c(1:173) # labeling of observations needed for GEE method
summary(gee(y ~ weight, data = Crabs, id=obs, family=poisson, scale.fix=T))
## Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
## running glm to get initial regression estimate
## (Intercept)      weight 
##  -0.4284053   0.5893041
## 
##  GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
##  gee S-function, version 4.13 modified 98/01/27 (1998) 
## 
## Model:
##  Link:                      Logarithm 
##  Variance to Mean Relation: Poisson 
##  Correlation Structure:     Independent 
## 
## Call:
## gee(formula = y ~ weight, id = obs, data = Crabs, family = poisson, 
##     scale.fix = T)
## 
## Summary of Residuals:
##       Min        1Q    Median        3Q       Max 
## -6.956930 -2.245961 -1.048869  1.801392 11.473098 
## 
## 
## Coefficients:
##               Estimate Naive S.E.   Naive z Robust S.E.  Robust z
## (Intercept) -0.4284053 0.17893528 -2.394192   0.3082884 -1.389625
## weight       0.5893041 0.06501714  9.063827   0.1103203  5.341755
## 
## Estimated Scale Parameter:  1
## Number of Iterations:  1
## 
## Working Correlation
##      [,1]
## [1,]    1

Sandwich estimator for the Quasi-likelihood model with quadratic mean-variance relationship

library(gee) 
obs <- c(1:173) # labeling of observations needed for GEE method
summary(gee(y ~ weight, data = Crabs, id=obs, family=quasi(link = "log", variance = "mu^2"), scale.fix=F))
## Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
## running glm to get initial regression estimate
## (Intercept)      weight 
##  -1.0121863   0.8183823
## 
##  GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
##  gee S-function, version 4.13 modified 98/01/27 (1998) 
## 
## Model:
##  Link:                      Logarithm 
##  Variance to Mean Relation: Gamma 
##  Correlation Structure:     Independent 
## 
## Call:
## gee(formula = y ~ weight, id = obs, data = Crabs, family = quasi(link = "log", 
##     variance = "mu^2"), scale.fix = F)
## 
## Summary of Residuals:
##         Min          1Q      Median          3Q         Max 
## -18.6207207  -2.1942696  -0.9860146   1.7085753  11.6128674 
## 
## 
## Coefficients:
##               Estimate Naive S.E.   Naive z Robust S.E.  Robust z
## (Intercept) -1.0121863  0.3862555 -2.620510   0.3721267 -2.720004
## weight       0.8183823  0.1542441  5.305759   0.1332920  6.139772
## 
## Estimated Scale Parameter:  1.362496
## Number of Iterations:  1
## 
## Working Correlation
##      [,1]
## [1,]    1