1. Horseshoe crab data revisited

Crabs <- read.table("Crabs.dat", header = T)
Crabs

ABCDEFGHIJ0123456789

crab <int>	y <int>	weight <dbl>	width <dbl>	color <int>	spine <int>
1	8	3.050	28.3	2	3
2	0	1.550	22.5	3	3
3	9	2.300	26.0	1	1
4	0	2.100	24.8	3	3
5	4	2.600	26.0	3	3
6	0	2.100	23.8	2	3
7	0	2.350	26.5	1	1
8	0	1.900	24.7	3	2
9	0	1.950	23.7	2	1
10	0	2.150	25.6	3	3

First, we fit with a Poisson model. The deviance residuals seems to be higher than 1, indicating over-dispersion. Not very clear whether the mean-variance relationship is linear or quardratic.

fit.pois <- glm(y ~ weight, family = poisson, data = Crabs)
summary(fit.pois)

## 
## Call:
## glm(formula = y ~ weight, family = poisson, data = Crabs)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.9307  -1.9981  -0.5627   0.9298   4.9992  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.42841    0.17893  -2.394   0.0167 *  
## weight       0.58930    0.06502   9.064   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 632.79  on 172  degrees of freedom
## Residual deviance: 560.87  on 171  degrees of freedom
## AIC: 920.16
## 
## Number of Fisher Scoring iterations: 5

plot(predict(fit.pois), residuals(fit.pois)^2, xlab = "fitted", ylab = "residual^2")
abline(a = 0, b= 1, col = "red")

1.1 Quasi-likelihood models

First, we fit with a Poisson model. The deviance residuals seems to be higher than 1, indicating over-dispersion. Under the proportional variance model:

we can estimate the level of overdispersion as

## empirical estimate of dispersion parameter for the proportional model Var(y_i) = \phi v*(\mu_i) 
phi <- sum(residuals(fit.pois, type = "pearson")^2)/(nrow(Crabs) - 2)
phi

## [1] 3.133893

We can compare this estimate with the quasi-likelihood esitmate from R. The estimates of stay the same. The inflation of the standard error of should be roughly .

summary(glm(y ~ weight, family = quasi(link = "log", variance = "mu"), data = Crabs))

## 
## Call:
## glm(formula = y ~ weight, family = quasi(link = "log", variance = "mu"), 
##     data = Crabs)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.9307  -1.9981  -0.5627   0.9298   4.9992  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -0.4284     0.3168  -1.352    0.178    
## weight        0.5893     0.1151   5.120 8.17e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasi family taken to be 3.134159)
## 
##     Null deviance: 632.79  on 172  degrees of freedom
## Residual deviance: 560.87  on 171  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 5

We may also assume the mean variance relationship is quardratic as

The estimates of

will be different from the Poisson GLM estimates as estimating equations of

change (though we can still seperate the estimate of

from

because of the proportional mean-variance relationship assumption).

summary(glm(y ~ weight, family = quasi(link = "log", variance = "mu^2"), data = Crabs))

## 
## Call:
## glm(formula = y ~ weight, family = quasi(link = "log", variance = "mu^2"), 
##     data = Crabs)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2186  -0.2832   0.0000   0.5603   2.4883  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -1.0122     0.3863  -2.621  0.00957 ** 
## weight        0.8184     0.1542   5.306 3.44e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasi family taken to be 1.362496)
## 
##     Null deviance: 80.907  on 172  degrees of freedom
## Residual deviance: 89.585  on 171  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 19

library(MASS)

# Compare with Negative binomial model
summary(glm.nb(y ~ weight, data = Crabs))

## 
## Call:
## glm.nb(formula = y ~ weight, data = Crabs, init.theta = 0.9310592338, 
##     link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8394  -1.4122  -0.3247   0.4744   2.1279  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.8647     0.4048  -2.136   0.0327 *  
## weight        0.7603     0.1578   4.817 1.45e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(0.9311) family taken to be 1)
## 
##     Null deviance: 216.43  on 172  degrees of freedom
## Residual deviance: 196.16  on 171  degrees of freedom
## AIC: 754.64
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.931 
##           Std. Err.:  0.168 
## 
##  2 x log-likelihood:  -748.644

Such a quasi-likelihood model would not be similar to a Negative-Binomial model as the mean-variance relationship assumption is similar (NB: ).

1.2 Sandwich estimator of the variance of

Sandwich estimator for the Possion GLM model

library(gee) 
obs <- c(1:173) # labeling of observations needed for GEE method
summary(gee(y ~ weight, data = Crabs, id=obs, family=poisson, scale.fix=T))

## Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27

## running glm to get initial regression estimate

## (Intercept)      weight 
##  -0.4284053   0.5893041

## 
##  GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
##  gee S-function, version 4.13 modified 98/01/27 (1998) 
## 
## Model:
##  Link:                      Logarithm 
##  Variance to Mean Relation: Poisson 
##  Correlation Structure:     Independent 
## 
## Call:
## gee(formula = y ~ weight, id = obs, data = Crabs, family = poisson, 
##     scale.fix = T)
## 
## Summary of Residuals:
##       Min        1Q    Median        3Q       Max 
## -6.956930 -2.245961 -1.048869  1.801392 11.473098 
## 
## 
## Coefficients:
##               Estimate Naive S.E.   Naive z Robust S.E.  Robust z
## (Intercept) -0.4284053 0.17893528 -2.394192   0.3082884 -1.389625
## weight       0.5893041 0.06501714  9.063827   0.1103203  5.341755
## 
## Estimated Scale Parameter:  1
## Number of Iterations:  1
## 
## Working Correlation
##      [,1]
## [1,]    1

Sandwich estimator for the Quasi-likelihood model with quadratic mean-variance relationship

library(gee) 
obs <- c(1:173) # labeling of observations needed for GEE method
summary(gee(y ~ weight, data = Crabs, id=obs, family=quasi(link = "log", variance = "mu^2"), scale.fix=F))

## Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27

## running glm to get initial regression estimate

## (Intercept)      weight 
##  -1.0121863   0.8183823

## 
##  GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
##  gee S-function, version 4.13 modified 98/01/27 (1998) 
## 
## Model:
##  Link:                      Logarithm 
##  Variance to Mean Relation: Gamma 
##  Correlation Structure:     Independent 
## 
## Call:
## gee(formula = y ~ weight, id = obs, data = Crabs, family = quasi(link = "log", 
##     variance = "mu^2"), scale.fix = F)
## 
## Summary of Residuals:
##         Min          1Q      Median          3Q         Max 
## -18.6207207  -2.1942696  -0.9860146   1.7085753  11.6128674 
## 
## 
## Coefficients:
##               Estimate Naive S.E.   Naive z Robust S.E.  Robust z
## (Intercept) -1.0121863  0.3862555 -2.620510   0.3721267 -2.720004
## weight       0.8183823  0.1542441  5.305759   0.1332920  6.139772
## 
## Estimated Scale Parameter:  1.362496
## Number of Iterations:  1
## 
## Working Correlation
##      [,1]
## [1,]    1

Data Example 7: Examples for quasi-likelihood models and Sandwich estimator

1. Horseshoe crab data revisited

1.1 Quasi-likelihood models

1.2 Sandwich estimator of the variance of ˆβ

1.2 Sandwich estimator of the variance of