The data reports the death of adult flour beetles after the exposure to gaseous carbon disulfide at various dosages. The data is in a group-level form.
beetles2 <- read.table("beetles2.dat", header = T)
beetles2
Let’s use the probit link.
alive <- beetles2$n - beetles2$dead
data <- matrix(append(beetles2$dead, alive), ncol = 2)
logdose <- beetles2$logdose
dead <- beetles2$dead
n <- beetles2$n
fit.probit <- glm(data ~ logdose, family = binomial(link = probit))
summary(fit.probit)
##
## Call:
## glm(formula = data ~ logdose, family = binomial(link = probit))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.5627 -0.4848 0.7647 1.0530 1.3149
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -34.956 2.649 -13.20 <2e-16 ***
## logdose 19.741 1.488 13.27 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 284.202 on 7 degrees of freedom
## Residual deviance: 9.987 on 6 degrees of freedom
## AIC: 40.185
##
## Number of Fisher Scoring iterations: 4
Residual deviance is \(9.99\) (with p-value \(0.125\) from the likelihood ratio test, after comparing with the group-level saturated model)
Now let’s check the ungrouped data
Beetles <- read.table("Beetles.dat", header = T)
Beetles
fit.probit2 <- glm(y ~ x, family = binomial(link = probit), data = Beetles)
summary(fit.probit2)
##
## Call:
## glm(formula = y ~ x, family = binomial(link = probit), data = Beetles)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.5638 -0.6263 0.1597 0.4478 2.3883
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -34.956 2.649 -13.20 <2e-16 ***
## x 19.741 1.488 13.27 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 645.44 on 480 degrees of freedom
## Residual deviance: 371.23 on 479 degrees of freedom
## AIC: 375.23
##
## Number of Fisher Scoring iterations: 6
Residual deviance is \(371.23\). The log-likelihood ratio test here for the residual deviance is invalid.
This is the fit using the probit link. The data do not support the sysmmetric of the response curve at \(0.5\).
plot(logdose, dead/n, pch = 20, ylim = c(0, 1))
curve(predict(fit.probit, newdata = list(logdose = x), type = "response"), add = T, lty = 2)
abline(h = 0.5, col = "red", lty = 2)
The fit using the logit link
fit.logit <- glm(data ~ logdose, family = binomial(link = logit))
summary(fit.logit)
##
## Call:
## glm(formula = data ~ logdose, family = binomial(link = logit))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.5878 -0.4085 0.8442 1.2455 1.5860
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -60.740 5.182 -11.72 <2e-16 ***
## logdose 34.286 2.913 11.77 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 284.202 on 7 degrees of freedom
## Residual deviance: 11.116 on 6 degrees of freedom
## AIC: 41.314
##
## Number of Fisher Scoring iterations: 4
plot(logdose, dead/n, pch = 20, ylim = c(0, 1))
curve(predict(fit.probit, newdata = list(logdose = x), type = "response"), add = T, lty = 2)
curve(predict(fit.logit, newdata = list(logdose = x), type = "response"), add = T, lty = 1)
The fitted curve is very similar, the residual deviance is slightly larger.
The fit using the complementary log-log link
fit.cloglog <- glm(data ~ logdose, family = binomial(link = cloglog))
summary(fit.cloglog)
##
## Call:
## glm(formula = data ~ logdose, family = binomial(link = cloglog))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.80002 -0.56588 0.01475 0.38096 1.31591
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -39.522 3.236 -12.21 <2e-16 ***
## logdose 22.015 1.797 12.25 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 284.2024 on 7 degrees of freedom
## Residual deviance: 3.5143 on 6 degrees of freedom
## AIC: 33.712
##
## Number of Fisher Scoring iterations: 4
plot(logdose, dead/n, pch = 20, ylim = c(0, 1))
curve(predict(fit.probit, newdata = list(logdose = x), type = "response"), add = T, lty = 2)
curve(predict(fit.cloglog, newdata = list(logdose = x), type = "response"), add = T, lty = 1, col = "blue")
The fitted curve is better, and the residual deviance is smaller.
The fit using the log-log link. In R, we can not directly use a log-log link. We can fit a log-log link Binomial GLM using the cloglog link (see Agresti Chapter 5.6.3)
data2 <- matrix(append(alive, dead), ncol = 2)
fit.loglog <- glm(data2 ~ logdose, family = binomial(link = cloglog))
summary(fit.loglog)
##
## Call:
## glm(formula = data2 ~ logdose, family = binomial(link = cloglog))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.4425 -2.0554 -0.7002 0.4494 2.5362
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 37.661 2.949 12.77 <2e-16 ***
## logdose -21.583 1.680 -12.85 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 284.202 on 7 degrees of freedom
## Residual deviance: 27.573 on 6 degrees of freedom
## AIC: 57.771
##
## Number of Fisher Scoring iterations: 6
plot(logdose, dead/n, pch = 20, ylim = c(0, 1))
curve(predict(fit.probit, newdata = list(logdose = x), type = "response"), add = T, lty = 2)
curve(1 - predict(fit.loglog, newdata = list(logdose = x), type = "response"), add = T, lty = 1, col = "red")
The fitted curve is much worse, and the residual deviance is larger.