R

This section discusses how to center variables and estimate multilevel models using R. Install and load the package lme4, which fits linear and generalized linear mixed-effects models. From the drop-down menu, select Install Packages and then Load Package under Packages, or, alternatively, use the following syntax:

> install.packages(lme4)
> library(lme4)

Next, load the High School and Beyond dataset in R and attach the dataset to the R search path:

> HSBdata <- read.table("C:/user/temp/hsbALL.txt", header=T, sep=",")
> attach(HSBdata)

To center a level-1 variable around the mean of all cases within the same level-2 group, use the ave() function, which estimates group averages. The new variable meanses is the average of ses for each high school group id. The new variable centses is centered around the mean of all cases within each level-2 group by subtracting meanses from ses. The text HSBdata$, which is attached to the new variable name, indicates that the variables will be created within the existing dataset HSBdata. Remember to attach the dataset to the R search path after making any changes to the variables.

> HSBdata$meanses <- ave(ses, list(id))
> HSBdata$centses <- ses - meanses}
> attach(HSBdata)

Within the lme4 package, the lme() function estimates linear mixed effects models. To use lme(), specify the dependent variable, the fixed components after the tilde sign and the random components in parentheses. Indicate which dataset R should use. To fit the empty model described above (5), use the following sintax:

> results1 <- lmer(mathach ~ 1 + (1 | id), data = HSBdata)
> summary(results1)

R saves the results of the model in an object called results1, which is stored in memory and may be retrieved with the function summary(). The function lmer() estimates a model, in which mathach is the dependent variable. The intercept, denoted by 1 immediately following the tilde sign, is the intercept for the fixed effects. Within the parentheses, 1 denotes the random effects intercept, and the variable id is specified as the level-2 grouping variable. R uses the HSBdata for this analysis.

The results are displayed in Table 4. The average test score across schools, reflected in the fixed effects intercept term, is 12.6370. The variance component corresponding to the random intercept is 8.614. The two variance components can be used to partition the variance across levels. The intraclass correlation coefficient is equal to 8.614/(39.148+8.614)=18.04, meaning that roughly 18% of the variance is attributable to the school level.

To explain some of the school-level variance in student math achievement scores, incorporate school-level predictors in the empty the model. The socioeconomic status of the typical student and the school's status as public or private may influence test performance. The following R syntax indicates how to incorporate these two variables as fixed effects:

> results2 <- lmer(mathach ~ 1 + sesmeans + sector + (1 | id), data = HSBdata)
> summary(results2)

The intercept, which now corresponds to the expected math achievement score in a public school with average SES scores, is 12.1282. Moving to a private school bumps the expected score by 1.2254 points. In addition, a one-unit increase in the average SES score is associated with an expected increased in math achievement of 5.3328. These estimates are all significant.

The variance component corresponding to the random intercept has decreased to 2.3140, indicating that the inclusion of the level-2 variables has accounted for some of the unexplained variance in the math achievement. Nonetheless, the estimate is still more than twice the size of its standard error, suggesting that there remains unexplained variance.

The final model introduces the student socioeconomic status (SES) variable and cross-level interaction terms. The centered SES slope is treated as random because individual SES status may vary across schools. In addition, a school's average SES score and its sector (public or private) may interact with student-level SES, thus accounting for some of the variance in the math achievement slope. To include cross-level interaction terms in your model, place an asterisk between the two variables composing the interaction.

> results3 <- lmer(mathach ~ sesmeans + sector + centses + sesmeans*centses + sector*centses + (1 + centses|id), data = HSBdata)
> summary(results3)

The results are displayed in the final column of Table 4. Because there are interactions in the model, the marginal fixed effects of each variable now depend on the value of the other variable(s) involved in the interaction. The marginal effect of a one-unit change in student's SES on math achievement will depend on whether a school is public or private as well as on the average SES score for the school. When cross-level interactions are present, graphical means may be appropriate for exploring the contingent nature of marginal effects in greater detail. Here the simplest interpretation of the interaction coefficients is that the effect of student-level SES is significantly higher in wealthier schools and significantly lower in private schools.

The variance component for the random intercept is 2.37956, which is still large relative to its standard deviation of 1.54258. Thus some school-level variance remains unexmplained in the final model. The variance component corresponding to the slope, however, is quite small relative to its standard deviation. This suggests that the researcher may be justified in constraining the effect to be fixed.

R displays the deviance and AIC and BIC. Comparing both the AIC and BIC statistics in Table 4 it is clear that the final model is preferable to the first two models.


Up: SAS
Next: References