Notation for Mixed and Multilevel Models
Even if one is comfortable distinguishing between fixed and random
effects, additional confusion may emerge when trying to make sense
of the notation used to describe multilevel models. In non-experimental
disciplines, researchers tend to use the notation of Raudenbush and
Bryk (2002) that explicitly models the nested structure of the data.
Unfortunately his approach can be rather messy, and software documentation
typically relies on matrix notation instead. Both approaches are detailed
in this section.
In the archetypical cross-sectional example, a researcher is interested
in predicting test performance as a function of student-level and
school-level characteristics. Using the model-building notation, an
empty (i.e. lacking predictors) student-level model is specified first:
The outcome variable Y for individual i nested in school j is
equal to the average outcome in unit j plus an individual-level
error rij. Because there may also be an effect that is common
to all students within the same school, it is necessary to add a
school-level error term. This is done by specifying a separate
equation for the intercept:
where γ00 is the average outcome for the population and
u0j is a school-specific effect. Combining equations
3 and 4 yields:
Denoting the variance of rij as σ2 and the variance of
u0j as τoo, the percentage of observed variation in the
dependent variable attributable to school-level characteristics is
found by dividing τ00 by the total variance:
Here ρ is referred to as the intraclass correlation
coefficient. The percentage of variance attributable to
student-level traits is easily found according to 1 - ρ.
A researcher who has found a significant variance component for
τ00 may wish to incorporate macro level variables in an
attempt to account for some of this variation. For example, the
average socioeconomic status of students in a district may impact
the expected test performance of a school, or average test
performance may differ between private and public institutions.
These possibilities can be modeled by adding the school-level
variables to the intercept equation,
and substituting 7 into equation 3.
Additionally, the researcher may wish to include student-level covariates. A personal socioeconomic status may affect his or her test performance independent of the school's SES score. Thus equation 3 would become:
If the researcher wishes to treat student SES as a random effect (that
is, the researcher feels the effect of a student's SES status varies
between schools), he can do so by specifying an equation for the slope
in the same manner as was previously done with the intercept equation:
Finally, it is possible that the effect of a level-1 variable changes
across scores on a level-2 variable. The effect of a student's SES
status may be less important in a private rather than a public school,
or a student's individual SES status may be more important in schools
with higher average SES scores. To test these possibilities, one can
add the MEANSES and SECTOR variables to equation 9.
A random-intercept and random-slope model including level-2 covariates and cross-level
interactions is obtained by substituting equations 7 and 10 into 8:
This approach of building a multilevel model through the specification
and combination of different level-1 and level-2 models makes clear
the nested structure of the data. However, it is long and messy, and
what is more, it is inconsistent with the notation used in much of
the documentation for general statistical packages. Instead of the
step-by-step approach taken above, the pithier, and more general,
matrix notation is often used:
Here y is an n x 1 vector of responses,
X is an n x p matrix containing the fixed effects
regressors, β is a p x 1 vector of
fixed-effects parameters, Z is an n x q matrix of
random effects regressors, u is a q x 1 vector of random
effects, and ε is an n x 1 vector of
errors. The relationship between equations 12 and 11 is clearest when, in the step-by-step approach, the fixed
effects are grouped together in the first part of the
right-hand-side of the equation and the random effects are grouped
together in the second part.
Note that it is possible for a variable to appear as both a fixed effect and a random effect
(appearing in both X and Z from 12). In this example, estimating 13 would yield both fixed
effect and random effect estimates for the student-level SES variable. The fixed effect would
refer to the overall expected effect of a student's socioeconomic status on test scores; the random effect gives
information on whether or not this effect differs between schools.
Up: Vocabularly of Mixed and Multilevel Models
Next: Estimation



