Stata

This section discusses how to center variables and estimate multilevel models using Stata. A fuller treatment is available in Rabe-Hesketh and Skrondal (2005) and in the Stata documentation. Since release 9, Stata includes the command .xtmixed to estimate multilevel models. The .xt prefix signifies that the command belongs to the larger class of commands used to estimate models for longitudinal data. This reflects the fact that panel data can be thought of as multilevel data in which observations at multiple time points are nested within an individual. However, the command is appropriate for mixed model estimation in general, including cross-sectional applications.

In the HSB data file, the student-level SES variable is in its original metric (a standardized scale with a mean of zero). Oftentimes researchers dealing with hierarchically structured data wish to center a level-1 variable around the mean of all cases within the same level-2 group. Group-mean centering can be accomplished by using by using the .egen and .gen commands in Stata.

.egen sesmeans = mean(ses), by(id)
.gen centses = ses - sesmeans

Here .egen generates a new variable sesmeans, which is the mean of the ses for all cases within the same level-2 group. The subsequent line of code generates a new variable centses, which is centered around the mean of all cases within each level-2 group.

The syntax for estimating multilevel models in Stata begins with the .xtmixed command followed by the dependent variable and a list of independent variables. The last independent variable is followed by double vertical lines ||, after which the grouping variable and random effects are specified. .xtmixed will automatically specify the intercept to be random. A list of variables whose slopes are to be treated as random follows the colon. Note that, by default, Stata reports variance components as standard deviations (equal to the square root of the variance components). To get Stata to report variances instead, add the var option. The syntax for the empty model is the following:

.xtmixed mathach || id: , var

The results are displayed in Table 2 at the bottom of the page. The average test score across schools, reflected in the intercept term, is 12.63697. The variance component corresponding to the random intercept is 8.61403. Because this estimate is substantially larger than its standard error, there appears to be significant variation in school means.

The two variance components can be used to partition the variance across levels. The intraclass correlation coefficient is equal to 8.61403/(39.14832+8.61403)=18.04, meaning that roughly 18% of the variance is attributable to the school-level.

In order to explain some of the school-level variance in math achievement scores it is possible to incorporate school-level predictors into the model. For example, the socioeconomic status of the typical student, or the school's status as public or private, may influence test performance. The Stata syntax for adding these variables to the model is:

.xtmixed mathach meanses sector || id: , var

The intercept, which now corresponds to the expected math achievement score in a public school with average SES scores, is 12.12824. Moving to a private school bumps the expected score by 1.2254 points. In addition, a one-unit increase in the average SES score is associated with an expected increased in math achievement of 5.3328. These estimates are all significant.

The variance component corresponding to the random intercept has decreased to 2.313986, reflecting the fact that the inclusion of the level-2 variables has accounted for some of the variance in the dependent variable. Nonetheless, the estimate is still more than twice the size of its standard error, suggesting that there remains variance unaccounted for.

A final model introduces the student socioeconomic status variable. Because it is possible that the effect of individual SES status varies across schools, this slope is treated as random. In addition, a school's average SES score and its sector (public or private) may interact with student-level SES, accounting for some of the variance in the slope. In order to include these cross-level interactions in the model, however, it is necessary to first explicitly create the interaction variables in Stata:

.gen ses_mses=meanses*centses
.gen ses_sect=sector*centses

When estimating more than one random effect, the researcher must also be concerned with the covariances among the level-2 variance components. As with SPSS, in Stata it is necessary to add an option specifying that the covariance matrix for the random effects is unstructured (the default is to assume all covariances are zero). The syntax for estimating the random-slope model is thus:

.xtmixed mathach meanses sector centses ses_mses ses_sect || id: centses, var cov(un)

The results are displayed in the final column of Table 2. The intercept is 12.12793, which here is the expected math achievement score in a public school with average SES scores for a student at his or her school's average SES level. Because there are interactions in the model, the marginal fixed effects of each variable now depend on the value of the other variable(s) involved in the interaction. The marginal effect of a one-unit change in student's SES on math achievement will depend on whether a school is public or private as well as on the average SES score for the school. For a public school (where sector =0), the marginal effect of a one-unit change in the group-mean centered SES variable is equal to = γ10 + γ11(MEANSES) = 2.94504 + 1.039237(MEANSES). For a private school (where sector=1), the marginal effect of a one-unit change in student SES is equal to = γ10 + γ11(MEANSES) + γ12 = 2.94504 + 1.039237(MEANSES) - 1.642675. When cross-level interactions are present, graphical means may be appropriate for exploring the contingent nature of marginal effects in greater detail. Here the simplest interpretation of the interaction coefficients is that the effect of student-level SES is significantly higher in wealthier schools and significantly lower in private schools.

The variance component for the random intercept is 2.379597, which is still large relative to its standard error of 0.3714584. Thus there remains some school-level variance unaccounted for in the model. The variance component corresponding to the slope, however, is quite small relative to its standard error. This suggests that the researcher may be justified in constraining the effect to be fixed.

By default, Stata does not report model fit statistics such as the AIC or BIC. These can be requested, however, by using the postestimation command .estat ic. This displays the log-likelihood, which can be converted to Deviance according to the formula -2 * log likelihood. It also displays the AIC and BIC statistics in smaller-is-better form. Comparing both the AIC and BIC statistics in Table 2 it is clear that the final model is preferable to the first two models.



Up: SPSS
Next: SAS