Vocabularly of Mixed and Multilevel Models
Models for multilevel data have developed out of methods for analyzing experiments with
random effects. Thus it is important for those interested in using hierarchical linear models to
have a minimal understanding of the language experimental researchers use to differentiate
between effects considered to be random or fixed. In an ideal experiment, the researcher
is interested in whether the presence or absence of one factor affects scores on an outcome
variable. (In the parlance of experiments, a factor is a categorical variable; the term covariate refers to continuous independent variables.) Does a particular pill reduce cholesterol more than a placebo? Can behavioral
modification reduce a particular phobia better than psychoanalysis or no treatment? The
factors in these experiments are said to be fixed ``because the
same, fixed levels would be included in replications of the study''
(Maxwell and Delaney, pg. 469). That is, the researcher is only
interested in the exact categories of the factor that appear in the
experiment. The typical model for a one-factor experiment is:
where the score on the dependent variable for individual i is
equal to the grand mean of the sample (μ), the effect α
of receiving treatment j, and an individual error term eij.
In general, some kind of constraint is put on the alpha values, such
as that they sum to zero, so that the model is identified. In
addition, it is assumed that the errors are independent and normally
distributed with constant variance.
In some experiments, however, a particular factor may not be fixed
and perfectly replicable across experiments. Instead, the distinct
categories present in the experiment represent a random sample from
a larger population. For example, different nurses may administer an
experimental drug to subjects. Usually the effect of a specific
nurse is not of theoretical interest, but the researcher will want
to control for the possibility that an independent caregiver effect
is present beyond the fixed drug effect being investigated. In such
cases the researcher may add a term to control for the random
effect:
where β represents the effect of the kth level of the random
effect, and αβ represents the interaction between the
random and fixed effects. A model that contains only fixed effects
and no random effects, such as equation 1, is known as a fixed
effects model. One that includes only random effects and no fixed
effects is termed a random effects model. Equation 2 is
actually an example of a mixed effects model because it
contains both random and fixed effects.
While the notation in equation 2 for the random effect is the same
as for the fixed effect (that is, both are denoted by subscripted
Greek letters), an important difference exists in the tests for the
drug and nurse factors. For the fixed effect, the researcher is
interested in only those levels included in the experiment, and the
null hypothesis is that there are no differences in the means of
each treatment group:
For the random effect in the drug example, the researcher is not
interested in the particular nurses per se but instead wishes to
generalize about the potential effects of drawing different nurses
from the larger population. The null hypothesis for the random
effect is therefore that that its variance is equal to zero:
The estimated variance is known as a variance component, and
estimation of these is an essential step in mixed effects models.
Oftentimes in experimental settings, the random effects are
nuisances that must be controlled for. In the above example, the
effect of the drug was the primary interest, whereas the nurse
factor was potentially confounding but theoretically uninteresting.
It is nonetheless necessary to include the relevant random effects
in the model or otherwise run the risk of making false inferences
about the fixed effect (and any fixed/random effect interaction). In
other applications, particularly for the types of multi-level models
discussed below, the random effects are indeed of substantive
interest. A researcher comparing test scores of students across
schools may be interested in a school effect, even if it is only
possible to sample a limited number of districts.
The reason to review random effects in the context of experiments is
that methods for handling multilevel data are actually special cases
of mixed effects models. Hox and Kreft (1994) make the connection
clearly:
``An effect in ANOVA is said to be fixed when inferences are to be made only about the treatments actually included. An effect is random when the treatment groups are sampled from a population of treatment groups and inferences are to be made to the population of which these treatments are a sample. Random effects need random effects ANOVA models (Hays 1973). Multilevel models assume a hierarchically structured population, with random sampling of both groups and individuals within groups. Consequently, multilevel analysis models must incorporate random effects'' (pgs. 285-286).For scholars coming from non-experimental disciplines (i.e. those that rely more heavily on regression models than analysis of variance), it may be difficult to make sense of the documentation for statistical applications capable of estimating mixed models. Political scientists or sociologists, for example, come to utilize mixed models because they recognize that hierarchically structured data violate standard linear regression assumptions. However, because mixed models developed out of methods for evaluating experiments, much of the documentation for packages like SPSS, SAS, and Stata is made up of experimental examples. Hence it is important to recognize the connection between random effects ANOVA and hierarchical linear models.Note that the motivation for utilizing mixed models for multilevel data does not rest in the different number of observations at each level, as any model including a dummy variable involves nesting (e.g. survey respondents are nested within gender). The justification instead lies in the fact that the errors within each randomly sampled level-2 unit are likely correlated, necessitating the estimation of a random effects model. Once the researcher has accounted for error non-independence it is possible to make more accurate inferences about the fixed effects of interest.
Up: Introduction
Next: Notation for Mixed and Multilevel Models



