This example is an experiemnt studying the effect of gamma radiation on the number of chromosomal abnormalities (ca). The number of cells differ at each run of experiment. So it makes sense to normalize ca by the number of cells. The example comes from Faraway book chapter 5.3
data(dicentric, package="faraway")
dicentric
Visualization of data
dicentric$rate <- dicentric$ca / dicentric$cells
plot(rate ~ jitter(doseamt), dicentric)
plot(rate ~ doserate, dicentric, col = dicentric$doseamt)
Rate itself does not seem to linearly depend on predictors. Effect of predictors seems to be multiplicative
library(GGally)
## Loading required package: ggplot2
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
dicentric$rate <- dicentric$ca / dicentric$cells
plot(log(rate) ~ jitter(doseamt), dicentric)
plot(log(rate) ~ log(doserate), dicentric, col = dicentric$doseamt)
fit.linear <- lm(log(rate) ~ factor(doseamt) * log(doserate), dicentric)
summary(fit.linear)
##
## Call:
## lm(formula = log(rate) ~ factor(doseamt) * log(doserate), data = dicentric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.16950 -0.05413 0.00585 0.06717 0.16343
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.76243 0.03352 -82.402 < 2e-16 ***
## factor(doseamt)2.5 1.64378 0.04741 34.672 < 2e-16 ***
## factor(doseamt)5 2.77866 0.04741 58.610 < 2e-16 ***
## log(doserate) 0.07561 0.02866 2.638 0.015364 *
## factor(doseamt)2.5:log(doserate) 0.16483 0.04053 4.067 0.000553 ***
## factor(doseamt)5:log(doserate) 0.19480 0.04053 4.807 9.47e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1006 on 21 degrees of freedom
## Multiple R-squared: 0.9943, Adjusted R-squared: 0.9929
## F-statistic: 729.4 on 5 and 21 DF, p-value: < 2.2e-16
plot(residuals(fit.linear) ~ exp(fitted(fit.linear)), xlab = "fitted rate", ylab = "residuals")
In this dataset, the linear model actually seems fine if we transform the response (in the book the residual plot looks different as the response is not log-transformed)
A more principled way is to use Poisson regression on the ca counts. We need to put log(cells) as a covariate as the effect of # of cells on ca count should be multiplicative
fit.poisson <- glm(ca ~ log(cells) + log(doserate)*factor(doseamt), family = poisson, data = dicentric)
summary(fit.poisson)
##
## Call:
## glm(formula = ca ~ log(cells) + log(doserate) * factor(doseamt),
## family = poisson, data = dicentric)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.49901 -0.62229 -0.05021 0.76919 1.59525
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.76534 0.38116 -7.255 4.02e-13 ***
## log(cells) 1.00252 0.05137 19.517 < 2e-16 ***
## log(doserate) 0.07200 0.03547 2.030 0.042403 *
## factor(doseamt)2.5 1.62984 0.10273 15.866 < 2e-16 ***
## factor(doseamt)5 2.76673 0.12287 22.517 < 2e-16 ***
## log(doserate):factor(doseamt)2.5 0.16111 0.04837 3.331 0.000866 ***
## log(doserate):factor(doseamt)5 0.19316 0.04299 4.493 7.03e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 916.127 on 26 degrees of freedom
## Residual deviance: 21.748 on 20 degrees of freedom
## AIC: 211.15
##
## Number of Fisher Scoring iterations: 4
The coefficient of log(cells) is very close to 1
To improve intepretation, we can force of coefficient of log(cells) to be 1
fit.poisson.offset <- glm(ca ~ offset(log(cells)) + log(doserate)*factor(doseamt), family = poisson, data = dicentric)
summary(fit.poisson.offset)
##
## Call:
## glm(formula = ca ~ offset(log(cells)) + log(doserate) * factor(doseamt),
## family = poisson, data = dicentric)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.49101 -0.62473 -0.05078 0.76786 1.59115
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.74671 0.03426 -80.165 < 2e-16 ***
## log(doserate) 0.07178 0.03518 2.041 0.041299 *
## factor(doseamt)2.5 1.62542 0.04946 32.863 < 2e-16 ***
## factor(doseamt)5 2.76109 0.04349 63.491 < 2e-16 ***
## log(doserate):factor(doseamt)2.5 0.16122 0.04830 3.338 0.000844 ***
## log(doserate):factor(doseamt)5 0.19350 0.04243 4.561 5.1e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 4753.00 on 26 degrees of freedom
## Residual deviance: 21.75 on 21 degrees of freedom
## AIC: 209.16
##
## Number of Fisher Scoring iterations: 4
The deviance residuals look good. Also, the residual deviance is very small compared to the null deviance. We keep seeing a strong interaction effect.