Notation for Mixed and Multilevel Models
Even if one is comfortable distinguishing between fixed and random
effects, additional confusion may emerge when trying to make sense
of the notation used to describe multilevel models. In
non-experimental disciplines, researchers tend to use the notation
of Raudenbush and Bryk (2002) that explicitly models the nested
structure of the data. Unfortunately his approach can be rather
messy, and software documentation typically relies instead on matrix
notation. Both approaches are detailed in this section.
In the archetypical cross-sectional example, a researcher is
interested in predicting test performance as a function of
student-level and school-level characteristics. Using the
model-building notation, an empty (i.e. lacking predictors)
student-level model is specified first:
The outcome variable Y for individual i nested in school j is
equal to the average outcome in unit j plus an individual-level
error rij. Because there may also be an effect that is common
to all students within the same school, it is necessary to add a
school-level error term. This is done by specifying a separate
equation for the intercept:
where γ00 is the average outcome for the population and
u0j is a school-specific effect. Combining equations
3 and 4 yields:
Denoting the variance of rij as σ2 and the variance of
u0j as τoo, the percentage of observed variation in the
dependent variable attributable to school-level characteristics is
found by dividing τ00 by the total variance:
Here ρ is referred to as the intraclass correlation
coefficient. The percentage of variance attributable to
student-level traits is easily found according to 1 - ρ.
A researcher who has found a significant variance component for
τ00 may wish to incorporate macro level variables in an
attempt to account for some of this variation. For example, the
average socioeconomic status of students in a district may impact
the expected test performance of a school, or average test
performance may differ between private and public institutions.
These possibilities can be modeled by adding the school-level
variables to the intercept equation,
and substituting 7 into equation 3.
Additionally, the researcher may wish to include student-level covariates. A personal socioeconomic status may affect his or her test performance independent of the school's SES score. Thus equation 3 would become:
If the researcher wishes to treat student SES as a random effect
(that is, the researcher feels the effect of a student's SES status
varies between schools), he can do so by specifying an equation for
the slope in the same manner as was previously done with the
intercept equation:
Finally, it is possible that the effect of a level-1 variable
changes across scores on a level-2 variable. The effect of a
student's SES status may be less important in a private rather than
a public school, or a student's individual SES status may be more
important in schools with higher average SES. To test these
possibilities, one can add the MEANSES and SECTOR variables to
equation 9.
A random-intercept and random-slope model including level-2 covariates and cross-level
interactions is obtained by substituting equations 7 and 10 into 8:
This approach of building a multilevel model through the
specification and combination of different level-1 and level-2
models makes clear the nested structure of the data. However, it is
long and messy, and what is more, it is inconsistent with the
notation used in much of the documentation for general statistical
packages. Instead of the step-by-step approach taken above, the
pithier, and more general, matrix notation is often used:
Here y is an n x 1 vector of responses,
X is an n x p matrix containing the fixed effects
regressors, β is a p x 1 vector of
fixed-effects parameters, Z is an n x q matrix of
random effects regressors, u is a q x 1 vector of random
effects, and ε is an n x 1 vector of
errors. The relationship between equations 12 and 11 is clearest when, in the step-by-step approach, the fixed
effects are grouped together in the first part of the
right-hand-side of the equation and the random effects are grouped
together in the second part.
Note that it is possible for a variable to appear as both a fixed effect and a random effect
(appearing in both X and Z from 12). In this example, estimating 13 would yield both fixed
effect and random effect estimates for the student-level SES variable. The fixed effect would
refer to the overall expected effect of a student's socioeconomic status on test scores; the random effect gives
information on whether or not this effect differs between schools.
Up: Vocabularly of Mixed and Multilevel Models
Next: Estimation



