This section discusses how to center variables and estimate
multilevel models using Stata. A fuller treatment is available in
Rabe-Hesketh and Skrondal (2005) and in the Stata documentation.
Since release 9, Stata includes the command .xtmixed
to estimate multilevel models. The .xt
prefix signifies that the command belongs to the larger class of
commands used to estimate models for longitudinal data. This
reflects the fact that panel data can be thought of as multilevel
data in which observations at multiple time points are nested within
an individual. However, the command is appropriate for mixed model
estimation in general, including cross-sectional applications.
In the HSB data file, the student-level SES variable is in its
original metric (a standardized scale with a mean of zero).
Oftentimes researchers dealing with hierarchically structured data
wish to center a level-1 variable around the mean of all cases
within the same level-2 group. Group-mean centering can be
accomplished by using by using the .egen and .gen commands in Stata.
generates a new variable sesmeans
, which is the mean of the ses
for all cases within the same level-2 group. The subsequent line of code generates a new variable centses
, which is centered around the mean of all cases within each level-2 group.
The syntax for estimating multilevel models in Stata begins with the
.xtmixed command followed by the dependent
variable and a list of independent variables. The last independent
variable is followed by double vertical lines ||, after which the
grouping variable and random effects are specified.
.xtmixed will automatically specify the intercept to be
random. A list of variables whose slopes are to be treated as
random follows the colon. Note that, by default, Stata reports
variance components as standard deviations (equal to the square root
of the variance components). To get Stata to report variances
instead, add the var option. The syntax for the
empty model is the following:
.xtmixed mathach || id: , var
The results are displayed in Table 2 at the bottom of the page. The average test score across schools, reflected in
the intercept term, is 12.63697. The variance component
corresponding to the random intercept is 8.61403. Because this
estimate is substantially larger than its standard error, there
appears to be significant variation in school means.
The two variance components can be used to partition the variance
across levels. The intraclass correlation coefficient is equal to
8.61403/(39.14832+8.61403)=18.04, meaning that roughly 18%
of the variance is attributable to the school-level.
In order to explain some of the school-level variance in math
achievement scores it is possible to incorporate school-level
predictors into the model. For example, the socioeconomic status of
the typical student, or the school's status as public or private,
may influence test performance. The Stata syntax for adding these
variables to the model is:
.xtmixed mathach meanses sector || id: , var
The intercept, which now corresponds to the expected math
achievement score in a public school with average SES scores, is
12.12824. Moving to a private school bumps the expected score by
1.2254 points. In addition, a one-unit increase in the average SES
score is associated with an expected increased in math achievement
of 5.3328. These estimates are all significant.
The variance component corresponding to the random intercept has
decreased to 2.313986, reflecting the fact that the inclusion of the
level-2 variables has accounted for some of the variance in the
dependent variable. Nonetheless, the estimate is still more than
twice the size of its standard error, suggesting that there remains
variance unaccounted for.
A final model introduces the student socioeconomic status variable.
Because it is possible that the effect of individual SES status
varies across schools, this slope is treated as random. In
addition, a school's average SES score and its sector (public or
private) may interact with student-level SES, accounting for some of
the variance in the slope. In order to include these cross-level
interactions in the model, however, it is necessary to first
explicitly create the interaction variables in Stata:
When estimating more than one random effect,
the researcher must also be concerned with the covariances among the
level-2 variance components. As with SPSS, in Stata it is necessary
to add an option specifying that the covariance matrix for the
random effects is unstructured (the default is to assume all
covariances are zero). The syntax for estimating the random-slope
model is thus:
.xtmixed mathach meanses sector centses ses_mses ses_sect || id:
centses, var cov(un)
The results are displayed in the final column
of Table 2. The intercept is 12.12793, which here is the
expected math achievement score in a public school with average SES
scores for a student at his or her school's average SES level.
Because there are interactions in the model, the marginal fixed
effects of each variable now depend on the value of the other
variable(s) involved in the interaction. The marginal effect of a
one-unit change in student's SES on math achievement will depend on
whether a school is public or private as well as on the average SES
score for the school. For a public school (where
=0), the marginal effect of a one-unit change in the
group-mean centered SES variable is equal to
) = 2.94504 + 1.039237(MEANSES
For a private school (where sector
marginal effect of a one-unit change in student SES is equal to
) + γ12
= 2.94504 + 1.039237(MEANSES
- 1.642675. When cross-level interactions are present, graphical
means may be appropriate for exploring the contingent nature of
marginal effects in greater detail. Here the simplest interpretation
of the interaction coefficients is that the effect of student-level
SES is significantly higher in wealthier schools and significantly
lower in private schools.
The variance component for the random intercept is 2.379597, which
is still large relative to its standard error of 0.3714584. Thus
there remains some school-level variance unaccounted for in the
model. The variance component corresponding to the slope, however,
is quite small relative to its standard error. This suggests that
the researcher may be justified in constraining the effect to be
By default, Stata does not report model fit statistics such as the
AIC or BIC. These can be requested, however, by using the
postestimation command .estat ic. This displays
the log-likelihood, which can be converted to Deviance according to
the formula -2 * log likelihood. It also displays the AIC
and BIC statistics in smaller-is-better form. Comparing both the
AIC and BIC statistics in Table 2 it is clear that the
final model is preferable to the first two models.