1. Example 1 (Faraway Chapter 10.1-10.5)

The data comes from an experiment to test the paper brightness depending on a shift operator describbed in Sheldon (1960). 5 Samples are tested on 4 different operators.

data(pulp, package = "faraway")
pulp

Visualize the data:

library(ggplot2)
ggplot(pulp, aes(x = operator, y = bright)) + geom_point(position = position_jitter(width = 0.1, height = 0.0))

Linear regression

\[y_{ij} = \mu + \alpha_i + \epsilon_{ij}\] with \(\sum \alpha_i = 0\).

op <- options(contrasts = c("contr.sum", "contr.poly")) ## make sure that \sum \alpha_i = 0
model.matrix(~operator, data = pulp)

##    (Intercept) operator1 operator2 operator3
## 1            1         1         0         0
## 2            1         1         0         0
## 3            1         1         0         0
## 4            1         1         0         0
## 5            1         1         0         0
## 6            1         0         1         0
## 7            1         0         1         0
## 8            1         0         1         0
## 9            1         0         1         0
## 10           1         0         1         0
## 11           1         0         0         1
## 12           1         0         0         1
## 13           1         0         0         1
## 14           1         0         0         1
## 15           1         0         0         1
## 16           1        -1        -1        -1
## 17           1        -1        -1        -1
## 18           1        -1        -1        -1
## 19           1        -1        -1        -1
## 20           1        -1        -1        -1
## attr(,"assign")
## [1] 0 1 1 1
## attr(,"contrasts")
## attr(,"contrasts")$operator
## [1] "contr.sum"

lmod <- lm(bright~operator, data = pulp)
summary(lmod)

## 
## Call:
## lm(formula = bright ~ operator, data = pulp)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -0.440 -0.195 -0.070  0.175  0.560 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 60.40000    0.07289 828.681   <2e-16 ***
## operator1   -0.16000    0.12624  -1.267    0.223    
## operator2   -0.34000    0.12624  -2.693    0.016 *  
## operator3    0.22000    0.12624   1.743    0.101    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.326 on 16 degrees of freedom
## Multiple R-squared:  0.4408, Adjusted R-squared:  0.3359 
## F-statistic: 4.204 on 3 and 16 DF,  p-value: 0.02261

\(\alpha_4 = 0.16 + 0.34 - 0.22 = 0.28\). We calculate sample variance of \(\hat\alpha_i\)

((-0.16)^2 + (-0.34)^2 + 0.22^2 + 0.28^2)/(4-1)

## [1] 0.08933333

Using LMM assuming that \(\alpha_i \sim N(0, \sigma^2)\).

library(lme4)

## Loading required package: Matrix

mmod <- lmer(bright ~ 1 + (1|operator), pulp)
summary(mmod)

## Linear mixed model fit by REML ['lmerMod']
## Formula: bright ~ 1 + (1 | operator)
##    Data: pulp
## 
## REML criterion at convergence: 18.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.4666 -0.7595 -0.1244  0.6281  1.6012 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  operator (Intercept) 0.06808  0.2609  
##  Residual             0.10625  0.3260  
## Number of obs: 20, groups:  operator, 4
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  60.4000     0.1494   404.2

Estimate of \(\sigma^2\) is slightly smaller than the sample variance of \(\hat\alpha_i\), why?
Standard error of the intercept increase in LMM

Prediction of each \(\alpha_i\) from LMM

ranef(mmod)$operator

alpha <- c(-0.16, -0.34, 0.22, 0.28)

The predicted \(\alpha_i\) from LMM can be viewed as a type of shrinkage estimator (towards \(0\)).

Test for group difference

We test for the null \(H_0: \sigma^2 = 0\). As mentioned earlier, the LRT provides a conservative p-value

nullmod <- lm(bright ~ 1, pulp)
smod <- lmer(bright ~ 1 + (1|operator), pulp, REML = F) ## solve the full MLE
lrtstat <- as.numeric(2*(logLik(smod) - logLik(nullmod)))
pvalue <- pchisq(lrtstat, 1, lower = F)
data.frame(lrtstat, pvalue)

Should we believe that there is no different across operators? Is the p-value too conservative? Or sample size too small?

We can use parametric bootstrap to approximate the null distribution of the LRT statistics.

lrtstat <- sapply(1:1000, function(i) {
  set.seed(i)
  y <- unlist(simulate(nullmod))
  bnull <- lm(y ~ 1)
  balt <- suppressMessages(lmer(y ~ 1 + (1|operator), pulp, REML = F))
  return(as.numeric(2*(logLik(balt) - logLik(bnull))))
})

mean(lrtstat > 2.5684)

## [1] 0.018

sqrt(0.018 * 0.982/1000)

## [1] 0.004204284

So we can be fairly sure that the p-value is under 0.05. We can use the RLRsim package to test for random effect terms.

library(RLRsim)
exactLRT(mmod, nullmod)

## No restrictions on fixed effects. REML-based inference preferable.

## Using likelihood evaluated at REML estimators.

## Please refit model with method="ML" for exact results.

## 
##  simulated finite sample distribution of LRT. (p-value based on 10000
##  simulated values)
## 
## data:  
## LRT = 2.4435, p-value = 0.0254

Model diagnosis

qqnorm(residuals(mmod), main = "")

plot(fitted(mmod), residuals(mmod), xlab = "Fitted", ylab = "Residuals")
abline(h = 0)

Random effect models are particularly sensitive to outliers, as they depend on variance components that can be substantially inflated by unusual points.

Example 2: nested groups (Faraway Chapter 10.8)

A large jar of dried egg power was divided up into a number of samples, and we test for consistency across lab tests. Four samples were sent to 6 labs, with two labeled as G and two labeled as H, and are tested by two different technicians in each lab.

data(eggs, package = "faraway")
eggs

The Technician “One” in different labs are different individuals.
Samples are also different in different labs.
6 different labs, each lab two technician.

We fit a model: \[y_{ijkl} = \mu + L_i + T_{ij} + S_{ijk} + \epsilon_{ijkl}\]

cmod <- lmer(Fat ~ 1 + (1|Lab) + (1|Lab:Technician) + (1|Lab:Technician:Sample), data = eggs)
summary(cmod)

## Linear mixed model fit by REML ['lmerMod']
## Formula: 
## Fat ~ 1 + (1 | Lab) + (1 | Lab:Technician) + (1 | Lab:Technician:Sample)
##    Data: eggs
## 
## REML criterion at convergence: -64.2
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.04098 -0.46576  0.00927  0.59713  1.54276 
## 
## Random effects:
##  Groups                Name        Variance Std.Dev.
##  Lab:Technician:Sample (Intercept) 0.003065 0.05536 
##  Lab:Technician        (Intercept) 0.006980 0.08355 
##  Lab                   (Intercept) 0.005920 0.07694 
##  Residual                          0.007196 0.08483 
## Number of obs: 48, groups:  
## Lab:Technician:Sample, 24; Lab:Technician, 12; Lab, 6
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  0.38750    0.04296   9.019

We test for the null \(H_0: \sigma_S^2 = 0\).

library(RLRsim)
## null LMM model
cmodr <- lmer(Fat ~ 1 + (1|Lab) + (1|Lab:Technician), data = eggs)
## the reduced model containing only the random effect to be tested
cmods <- lmer(Fat ~ 1 + (1|Lab:Technician:Sample), data = eggs)
exactRLRT(cmods, cmod, cmodr)

## 
##  simulated finite sample distribution of RLRT.
##  
##  (p-value based on 10000 simulated values)
## 
## data:  
## RLRT = 1.6034, p-value = 0.1011

Example 3: Not nested groups (Faraway Chapter 10.9)

Frou materials (A-D) are fed into a wear-testing machine. The machine can process four samples at a time and the four samples are put at 4 different positions. There might be some differences due to the positions.

data(abrasion, package = "faraway")
abrasion

Visulize the groups: run and position and two unnested groupings.

library(ggplot2)
ggplot(abrasion, aes(x = material, y =wear, shape = run, color = position)) + geom_point(position = position_jitter(width = 0.1, height = 0)) + scale_color_grey()

We fit the model \[y_{ijk} = \beta_i + \gamma_j + \theta_k + \epsilon_{ijk}\] for material \(i\), run \(j\) and position \(k\). We treat run and position as random effect.

mmod <- lmer(wear ~ material + (1|run) + (1|position), data = abrasion)
summary(mmod)

## Linear mixed model fit by REML ['lmerMod']
## Formula: wear ~ material + (1 | run) + (1 | position)
##    Data: abrasion
## 
## REML criterion at convergence: 103
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.08973 -0.30231  0.02697  0.42254  1.21052 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  run      (Intercept)  66.90    8.179  
##  position (Intercept) 107.06   10.347  
##  Residual              61.25    7.826  
## Number of obs: 16, groups:  run, 4; position, 4
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  239.500      6.879  34.817
## material1     26.250      3.389   7.746
## material2    -19.500      3.389  -5.754
## material3      2.250      3.389   0.664
## 
## Correlation of Fixed Effects:
##           (Intr) matrl1 matrl2
## material1  0.000              
## material2  0.000 -0.333       
## material3  0.000 -0.333 -0.333

We want to test for \(H_0: \sigma_{pos}^2 = 0\) and \(H_0: \sigma_{run}^2 = 0\)

## Null model
mmodr <- lmer(wear ~ material + (1|run), data = abrasion)
## the reduced model containing only the random effect to be tested
mmodp <- lmer(wear ~ material + (1|position), data = abrasion)
exactRLRT(mmodp, mmod, mmodr)

## 
##  simulated finite sample distribution of RLRT.
##  
##  (p-value based on 10000 simulated values)
## 
## data:  
## RLRT = 4.5931, p-value = 0.0136

exactRLRT(mmodr, mmod, mmodp)

## 
##  simulated finite sample distribution of RLRT.
##  
##  (p-value based on 10000 simulated values)
## 
## data:  
## RLRT = 3.0459, p-value = 0.033

Both are significant.

Data Example 8: More LMM examples