Example: Dose-response study

The data reports the death of adult flour beetles after the exposure to gaseous carbon disulfide at various dosages. The data is in a group-level form.

beetles2 <- read.table("beetles2.dat", header = T)
beetles2

ABCDEFGHIJ0123456789

logdose <dbl>	n <int>	dead <int>
1.691	59	6
1.724	60	13
1.755	62	18
1.784	56	28
1.811	63	52
1.837	59	53
1.861	62	61
1.884	60	60

2.1 Group-level data V.S. ungrouped data

Let’s use the probit link.

alive <- beetles2$n - beetles2$dead
data <- matrix(append(beetles2$dead, alive), ncol = 2)
logdose <- beetles2$logdose
dead <- beetles2$dead
n <- beetles2$n
fit.probit <- glm(data ~ logdose, family = binomial(link = probit))
summary(fit.probit)

## 
## Call:
## glm(formula = data ~ logdose, family = binomial(link = probit))
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5627  -0.4848   0.7647   1.0530   1.3149  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -34.956      2.649  -13.20   <2e-16 ***
## logdose       19.741      1.488   13.27   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 284.202  on 7  degrees of freedom
## Residual deviance:   9.987  on 6  degrees of freedom
## AIC: 40.185
## 
## Number of Fisher Scoring iterations: 4

Residual deviance is (with p-value from the likelihood ratio test, after comparing with the group-level saturated model)

Now let’s check the ungrouped data

Beetles <- read.table("Beetles.dat", header = T)
Beetles

ABCDEFGHIJ0123456789

x <dbl>	y <int>
1.691	1
1.691	1
1.691	1
1.691	1
1.691	1
1.691	1
1.691	0
1.691	0
1.691	0
1.691	0

fit.probit2 <- glm(y ~ x, family = binomial(link = probit), data = Beetles)
summary(fit.probit2)

## 
## Call:
## glm(formula = y ~ x, family = binomial(link = probit), data = Beetles)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.5638  -0.6263   0.1597   0.4478   2.3883  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -34.956      2.649  -13.20   <2e-16 ***
## x             19.741      1.488   13.27   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 645.44  on 480  degrees of freedom
## Residual deviance: 371.23  on 479  degrees of freedom
## AIC: 375.23
## 
## Number of Fisher Scoring iterations: 6

Residual deviance is . The log-likelihood ratio test here for the residual deviance is invalid.

The log-log link

This is the fit using the probit link. The data do not support the sysmmetric of the response curve at .

plot(logdose, dead/n, pch = 20, ylim = c(0, 1))
curve(predict(fit.probit, newdata = list(logdose = x), type = "response"), add = T, lty = 2)
abline(h = 0.5, col = "red", lty = 2)

The fit using the logit link

fit.logit <- glm(data ~ logdose, family = binomial(link = logit))
summary(fit.logit)

## 
## Call:
## glm(formula = data ~ logdose, family = binomial(link = logit))
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5878  -0.4085   0.8442   1.2455   1.5860  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -60.740      5.182  -11.72   <2e-16 ***
## logdose       34.286      2.913   11.77   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 284.202  on 7  degrees of freedom
## Residual deviance:  11.116  on 6  degrees of freedom
## AIC: 41.314
## 
## Number of Fisher Scoring iterations: 4

plot(logdose, dead/n, pch = 20, ylim = c(0, 1))
curve(predict(fit.probit, newdata = list(logdose = x), type = "response"), add = T, lty = 2)
curve(predict(fit.logit, newdata = list(logdose = x), type = "response"), add = T, lty = 1)

The fitted curve is very similar, the residual deviance is slightly larger.

The fit using the complementary log-log link

fit.cloglog <- glm(data ~ logdose, family = binomial(link = cloglog))
summary(fit.cloglog)

## 
## Call:
## glm(formula = data ~ logdose, family = binomial(link = cloglog))
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -0.80002  -0.56588   0.01475   0.38096   1.31591  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -39.522      3.236  -12.21   <2e-16 ***
## logdose       22.015      1.797   12.25   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 284.2024  on 7  degrees of freedom
## Residual deviance:   3.5143  on 6  degrees of freedom
## AIC: 33.712
## 
## Number of Fisher Scoring iterations: 4

plot(logdose, dead/n, pch = 20, ylim = c(0, 1))
curve(predict(fit.probit, newdata = list(logdose = x), type = "response"), add = T, lty = 2)
curve(predict(fit.cloglog, newdata = list(logdose = x), type = "response"), add = T, lty = 1, col = "blue")

The fitted curve is better, and the residual deviance is smaller.

The fit using the log-log link. In R, we can not directly use a log-log link. We can fit a log-log link Binomial GLM using the cloglog link (see Agresti Chapter 5.6.3)

data2 <- matrix(append(alive, dead), ncol = 2)
fit.loglog <- glm(data2 ~ logdose, family = binomial(link = cloglog))
summary(fit.loglog)

## 
## Call:
## glm(formula = data2 ~ logdose, family = binomial(link = cloglog))
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.4425  -2.0554  -0.7002   0.4494   2.5362  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   37.661      2.949   12.77   <2e-16 ***
## logdose      -21.583      1.680  -12.85   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 284.202  on 7  degrees of freedom
## Residual deviance:  27.573  on 6  degrees of freedom
## AIC: 57.771
## 
## Number of Fisher Scoring iterations: 6

plot(logdose, dead/n, pch = 20, ylim = c(0, 1))
curve(predict(fit.probit, newdata = list(logdose = x), type = "response"), add = T, lty = 2)
curve(1 - predict(fit.loglog, newdata = list(logdose = x), type = "response"), add = T, lty = 1, col = "red")

The fitted curve is much worse, and the residual deviance is larger.

Example 3, part I

Example: Dose-response study

2.1 Group-level data V.S. ungrouped data

The log-log link