1. Example 1 for LMM: Smoking prevention and cessation study

1600 students are collected from 135 classrooms in 28 schools. We want to understand the effect of SC (exposure to a school-based curriculum or not), TV (exposure to a television-based prevention program or not) and previous THK scale on the current THK scale. We have 1600 samples, but some share the same school and some share the same classroom.

The multilevel model: yics=β0+β1PTHKics+β2SCics+β3TVics+us+vcs+ϵics

Smoking <- read.table("Smoking.dat", header = T)
Smoking
ABCDEFGHIJ0123456789
school
<int>
class
<int>
SC
<int>
TV
<int>
PTHK
<int>
y
<int>
4034031011023
4034031011044
4034031011043
4034031011034
4034031011034
4034031011043
4034031011022
4034031011044
4034031011055
4034031011034

The LMM can be solved using R package lme4

library(lme4)
## Loading required package: Matrix
fit <- lmer(y ~ PTHK + SC + TV + (1|school) + (1|class), data = Smoking) # Fitting the variances is by REML
summary(fit)
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ PTHK + SC + TV + (1 | school) + (1 | class)
##    Data: Smoking
## 
## REML criterion at convergence: 5374.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.5202 -0.6975 -0.0177  0.6875  3.1630 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  class    (Intercept) 0.06853  0.2618  
##  school   (Intercept) 0.03925  0.1981  
##  Residual             1.60108  1.2653  
## Number of obs: 1600, groups:  class, 135; school, 28
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  1.78493    0.11295  15.803
## PTHK         0.30524    0.02590  11.786
## SC           0.47147    0.11330   4.161
## TV           0.01956    0.11330   0.173
## 
## Correlation of Fixed Effects:
##      (Intr) PTHK   SC    
## PTHK -0.493              
## SC   -0.503  0.025       
## TV   -0.521  0.015 -0.002

We can check the “predicted” value for each random effect and response

rand.eff <- ranef(fit) ## predict random effects for each school / class ## method: BLUP
pred <- predict(fit) ## fit each observation
rand.eff$school
ABCDEFGHIJ0123456789
 
 
(Intercept)
<dbl>
1930.013358240
1940.012207173
1960.065380029
1970.029031383
198-0.152589166
199-0.070955251
4010.066125878
402-0.019693415
4030.100165008
4040.009204788

Estimated intra-class correlation corr(yics,yics)=ˆσ2u+ˆσ2vˆσ2u+ˆσ2v+ˆσ2e=0.039+0.0690.039+0.069+1.60=0.063

Estimated intra-school correlation corr(yics,yics)=ˆσ2uˆσ2u+ˆσ2v+ˆσ2e=0.0390.039+0.069+1.60=0.023

Both correlations are very small.

If we ignore the random effects

summary(lm(y ~ PTHK + SC + TV, data = Smoking))
## 
## Call:
## lm(formula = y ~ PTHK + SC + TV, data = Smoking)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5635 -0.9130 -0.0626  0.8981  4.2416 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.73734    0.07866  22.088  < 2e-16 ***
## PTHK         0.32525    0.02589  12.561  < 2e-16 ***
## SC           0.47987    0.06529   7.350 3.15e-13 ***
## TV           0.04534    0.06518   0.696    0.487    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.303 on 1596 degrees of freedom
## Multiple R-squared:  0.1136, Adjusted R-squared:  0.112 
## F-statistic: 68.21 on 3 and 1596 DF,  p-value: < 2.2e-16

The standard errors of SC and TV become much smaller, though the pairwise correlations among classes and schools introduced by the random effect terms are small. The explanation is that, though each pairwise correlation is small, as all pairs of classes / schools have such correlations if we have many classes / schools, the aggregation of the variance inflation would be still be large enough to affect the variance of ˆβ.

2 Example for GLMM: modeling correlated survey response

The data is a survey result for whether the respondents support legalizing abortion under three situations: 1) if the family has a very low income and cannot afford anymore children, 2) when the women is not married and does not want to marry the man, 3) when the women wants it for any reason. We aim to understand how one person’s opinion change under different situations and whether gender has an effect.

Abortion <- read.table("Abortion.dat",header=T)
Abortion
ABCDEFGHIJ0123456789
gender
<int>
response
<int>
situation
<int>
case
<int>
1111
1121
1131
1112
1122
1132
1113
1123
1133
1114

2.1 Applying the GLMM model

We want to use a GLMM model to take into account the correlation of the choices from the same individual.

library(glmmML)
## one can also use the glmer function in the lme4 package
fit.glmm <- glmmML(response ~ gender + factor(situation),
                   cluster = case, 
                   data = Abortion,
                   family = binomial, 
                   method = "ghq", n.points = 70)
summary(fit.glmm)
## 
## Call:  glmmML(formula = response ~ gender + factor(situation), family = binomial,      data = Abortion, cluster = case, method = "ghq", n.points = 70) 
## 
## 
##                        coef se(coef)        z Pr(>|z|)
## (Intercept)         0.21596   0.3766  0.57347 5.66e-01
## gender              0.01257   0.4888  0.02572 9.79e-01
## factor(situation)2 -0.54230   0.1572 -3.45022 5.60e-04
## factor(situation)3 -0.83470   0.1601 -5.21346 1.85e-07
## 
## Scale parameter in mixing distribution:  8.736 gaussian 
## Std. Error:                              0.5421 
## 
##         LR p-value for H_0: sigma = 0:  0 
## 
## Residual deviance: 4579 on 5545 degrees of freedom   AIC: 4589

Intepretations of the results: - Intepretation of coefficients - individual heteogeneity: our estimate random effect is ˆσu=8.736, indicating strong correlation among the choices within one individual. - results change with n.points

Using the glmer function from lme4 will give you the same solution

library(lme4)
fit.glmm2 <- glmer(response ~ gender + factor(situation) + (1|case), data = Abortion,
                   family = binomial, nAGQ = 70)
summary(fit.glmm2)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
##   Gauss-Hermite Quadrature, nAGQ = 70) [glmerMod]
##  Family: binomial  ( logit )
## Formula: response ~ gender + factor(situation) + (1 | case)
##    Data: Abortion
## 
##      AIC      BIC   logLik deviance df.resid 
##   4588.5   4621.7  -2289.3   4578.5     5545 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.7809 -0.1224 -0.1056  0.1397  1.7149 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  case   (Intercept) 76.32    8.736   
## Number of obs: 5550, groups:  case, 1850
## 
## Fixed effects:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         0.21596    0.37680   0.573 0.566544    
## gender              0.01258    0.48904   0.026 0.979478    
## factor(situation)2 -0.54230    0.15719  -3.450 0.000561 ***
## factor(situation)3 -0.83470    0.16010  -5.214 1.85e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) gender fct()2
## gender      -0.727              
## fctr(sttn)2 -0.206  0.000       
## fctr(sttn)3 -0.206  0.000  0.512

2.2 Compare with the results from the standard logistic regression

Let’s look at the marginal model results. Here we just use standard GLM ignoring correlation structure. The book uses a more sophisticated marginal model which provides valid inference. Our standard glm here would not provide valid inference for the marginal model, but the estimates should still be good.

(Why? Brief explanation: though the correlation structure of the samples is not considered, the score equations from the standard GLM is typically still consistent. Specifically, the expectation of score equation at the true parameter value is 0, and when correlation across samples are weak, the consistency of score equation can be obtained from the LLN)

fit.glm <- glm(response ~ gender + factor(situation), data = Abortion, family = binomial)
summary(fit.glm)
## 
## Call:
## glm(formula = response ~ gender + factor(situation), family = binomial, 
##     data = Abortion)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.189  -1.148  -1.125   1.207   1.231  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)  
## (Intercept)         0.023940   0.055528   0.431   0.6664  
## gender              0.003582   0.054138   0.066   0.9472  
## factor(situation)2 -0.097329   0.065783  -1.480   0.1390  
## factor(situation)3 -0.149347   0.065825  -2.269   0.0233 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 7689.5  on 5549  degrees of freedom
## Residual deviance: 7684.2  on 5546  degrees of freedom
## AIC: 7692.2
## 
## Number of Fisher Scoring iterations: 3

The marginal model coefficients are very scaled down towards 0 compared with GLMM. We can check that the ratio is roughly 1/1+σ2u/1.72.

plot(fit.glm$coefficients, fit.glmm$coefficients, xlim = c(-0.5, 0.25), ylim = c(-1, 0.25))
abline(v = 0, lty = 2)
abline(h = 0, lty = 2)
abline(a = 0, b = sqrt(1 + (fit.glmm$sigma/1.7)^2), col = "red")