The data comes from an experiment to test the paper brightness depending on a shift operator describbed in Sheldon (1960). 5 Samples are tested on 4 different operators.
data(pulp, package = "faraway")
pulp
Visualize the data:
library(ggplot2)
ggplot(pulp, aes(x = operator, y = bright)) + geom_point(position = position_jitter(width = 0.1, height = 0.0))
\[y_{ij} = \mu + \alpha_i + \epsilon_{ij}\] with \(\sum \alpha_i = 0\).
op <- options(contrasts = c("contr.sum", "contr.poly")) ## make sure that \sum \alpha_i = 0
model.matrix(~operator, data = pulp)
## (Intercept) operator1 operator2 operator3
## 1 1 1 0 0
## 2 1 1 0 0
## 3 1 1 0 0
## 4 1 1 0 0
## 5 1 1 0 0
## 6 1 0 1 0
## 7 1 0 1 0
## 8 1 0 1 0
## 9 1 0 1 0
## 10 1 0 1 0
## 11 1 0 0 1
## 12 1 0 0 1
## 13 1 0 0 1
## 14 1 0 0 1
## 15 1 0 0 1
## 16 1 -1 -1 -1
## 17 1 -1 -1 -1
## 18 1 -1 -1 -1
## 19 1 -1 -1 -1
## 20 1 -1 -1 -1
## attr(,"assign")
## [1] 0 1 1 1
## attr(,"contrasts")
## attr(,"contrasts")$operator
## [1] "contr.sum"
lmod <- lm(bright~operator, data = pulp)
summary(lmod)
##
## Call:
## lm(formula = bright ~ operator, data = pulp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.440 -0.195 -0.070 0.175 0.560
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 60.40000 0.07289 828.681 <2e-16 ***
## operator1 -0.16000 0.12624 -1.267 0.223
## operator2 -0.34000 0.12624 -2.693 0.016 *
## operator3 0.22000 0.12624 1.743 0.101
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.326 on 16 degrees of freedom
## Multiple R-squared: 0.4408, Adjusted R-squared: 0.3359
## F-statistic: 4.204 on 3 and 16 DF, p-value: 0.02261
\(\alpha_4 = 0.16 + 0.34 - 0.22 = 0.28\). We calculate sample variance of \(\hat\alpha_i\)
((-0.16)^2 + (-0.34)^2 + 0.22^2 + 0.28^2)/(4-1)
## [1] 0.08933333
library(lme4)
## Loading required package: Matrix
mmod <- lmer(bright ~ 1 + (1|operator), pulp)
summary(mmod)
## Linear mixed model fit by REML ['lmerMod']
## Formula: bright ~ 1 + (1 | operator)
## Data: pulp
##
## REML criterion at convergence: 18.6
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.4666 -0.7595 -0.1244 0.6281 1.6012
##
## Random effects:
## Groups Name Variance Std.Dev.
## operator (Intercept) 0.06808 0.2609
## Residual 0.10625 0.3260
## Number of obs: 20, groups: operator, 4
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 60.4000 0.1494 404.2
ranef(mmod)$operator
alpha <- c(-0.16, -0.34, 0.22, 0.28)
The predicted \(\alpha_i\) from LMM can be viewed as a type of shrinkage estimator (towards \(0\)).
We test for the null \(H_0: \sigma^2 = 0\). As mentioned earlier, the LRT provides a conservative p-value
nullmod <- lm(bright ~ 1, pulp)
smod <- lmer(bright ~ 1 + (1|operator), pulp, REML = F) ## solve the full MLE
lrtstat <- as.numeric(2*(logLik(smod) - logLik(nullmod)))
pvalue <- pchisq(lrtstat, 1, lower = F)
data.frame(lrtstat, pvalue)
Should we believe that there is no different across operators? Is the p-value too conservative? Or sample size too small?
We can use parametric bootstrap to approximate the null distribution of the LRT statistics.
lrtstat <- sapply(1:1000, function(i) {
set.seed(i)
y <- unlist(simulate(nullmod))
bnull <- lm(y ~ 1)
balt <- suppressMessages(lmer(y ~ 1 + (1|operator), pulp, REML = F))
return(as.numeric(2*(logLik(balt) - logLik(bnull))))
})
mean(lrtstat > 2.5684)
## [1] 0.018
sqrt(0.018 * 0.982/1000)
## [1] 0.004204284
So we can be fairly sure that the p-value is under 0.05. We can use the RLRsim package to test for random effect terms.
library(RLRsim)
exactLRT(mmod, nullmod)
## No restrictions on fixed effects. REML-based inference preferable.
## Using likelihood evaluated at REML estimators.
## Please refit model with method="ML" for exact results.
##
## simulated finite sample distribution of LRT. (p-value based on 10000
## simulated values)
##
## data:
## LRT = 2.4435, p-value = 0.0254
qqnorm(residuals(mmod), main = "")
plot(fitted(mmod), residuals(mmod), xlab = "Fitted", ylab = "Residuals")
abline(h = 0)
Random effect models are particularly sensitive to outliers, as they depend on variance components that can be substantially inflated by unusual points.
A large jar of dried egg power was divided up into a number of samples, and we test for consistency across lab tests. Four samples were sent to 6 labs, with two labeled as G and two labeled as H, and are tested by two different technicians in each lab.
data(eggs, package = "faraway")
eggs
We fit a model: \[y_{ijkl} = \mu + L_i + T_{ij} + S_{ijk} + \epsilon_{ijkl}\]
cmod <- lmer(Fat ~ 1 + (1|Lab) + (1|Lab:Technician) + (1|Lab:Technician:Sample), data = eggs)
summary(cmod)
## Linear mixed model fit by REML ['lmerMod']
## Formula:
## Fat ~ 1 + (1 | Lab) + (1 | Lab:Technician) + (1 | Lab:Technician:Sample)
## Data: eggs
##
## REML criterion at convergence: -64.2
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.04098 -0.46576 0.00927 0.59713 1.54276
##
## Random effects:
## Groups Name Variance Std.Dev.
## Lab:Technician:Sample (Intercept) 0.003065 0.05536
## Lab:Technician (Intercept) 0.006980 0.08355
## Lab (Intercept) 0.005920 0.07694
## Residual 0.007196 0.08483
## Number of obs: 48, groups:
## Lab:Technician:Sample, 24; Lab:Technician, 12; Lab, 6
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.38750 0.04296 9.019
We test for the null \(H_0: \sigma_S^2 = 0\).
library(RLRsim)
## null LMM model
cmodr <- lmer(Fat ~ 1 + (1|Lab) + (1|Lab:Technician), data = eggs)
## the reduced model containing only the random effect to be tested
cmods <- lmer(Fat ~ 1 + (1|Lab:Technician:Sample), data = eggs)
exactRLRT(cmods, cmod, cmodr)
##
## simulated finite sample distribution of RLRT.
##
## (p-value based on 10000 simulated values)
##
## data:
## RLRT = 1.6034, p-value = 0.1011
Frou materials (A-D) are fed into a wear-testing machine. The machine can process four samples at a time and the four samples are put at 4 different positions. There might be some differences due to the positions.
data(abrasion, package = "faraway")
abrasion
Visulize the groups: run and position and two unnested groupings.
library(ggplot2)
ggplot(abrasion, aes(x = material, y =wear, shape = run, color = position)) + geom_point(position = position_jitter(width = 0.1, height = 0)) + scale_color_grey()
We fit the model \[y_{ijk} = \beta_i + \gamma_j + \theta_k + \epsilon_{ijk}\] for material \(i\), run \(j\) and position \(k\). We treat run and position as random effect.
mmod <- lmer(wear ~ material + (1|run) + (1|position), data = abrasion)
summary(mmod)
## Linear mixed model fit by REML ['lmerMod']
## Formula: wear ~ material + (1 | run) + (1 | position)
## Data: abrasion
##
## REML criterion at convergence: 103
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.08973 -0.30231 0.02697 0.42254 1.21052
##
## Random effects:
## Groups Name Variance Std.Dev.
## run (Intercept) 66.90 8.179
## position (Intercept) 107.06 10.347
## Residual 61.25 7.826
## Number of obs: 16, groups: run, 4; position, 4
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 239.500 6.879 34.817
## material1 26.250 3.389 7.746
## material2 -19.500 3.389 -5.754
## material3 2.250 3.389 0.664
##
## Correlation of Fixed Effects:
## (Intr) matrl1 matrl2
## material1 0.000
## material2 0.000 -0.333
## material3 0.000 -0.333 -0.333
We want to test for \(H_0: \sigma_{pos}^2 = 0\) and \(H_0: \sigma_{run}^2 = 0\)
## Null model
mmodr <- lmer(wear ~ material + (1|run), data = abrasion)
## the reduced model containing only the random effect to be tested
mmodp <- lmer(wear ~ material + (1|position), data = abrasion)
exactRLRT(mmodp, mmod, mmodr)
##
## simulated finite sample distribution of RLRT.
##
## (p-value based on 10000 simulated values)
##
## data:
## RLRT = 4.5931, p-value = 0.0136
exactRLRT(mmodr, mmod, mmodp)
##
## simulated finite sample distribution of RLRT.
##
## (p-value based on 10000 simulated values)
##
## data:
## RLRT = 3.0459, p-value = 0.033
Both are significant.