Estimation
An essential step to estimating multilevel models is the estimation
of variance components. Up until the 1970s, the literature on variance
component estimation focused on using ANOVA techniques that derived
from the work of Fisher {[}adapted to unbalanced data by Henderson
(1953){]}. Since the 1970s, Full and Restricted Maximum Likelihood
estimation (FML and REML, respectively) have become the preferred
methods. ML approaches have several advantages, including the ability
to handle unbalanced data without some of the pathologies of ANOVA
methods (i.e. lack of uniqueness, negative variance estimates). Both
FML and REML produce identical fixed effects estimates. The latter,
however, takes into account the degrees of freedom from the fixed
effects and thus produces variance components estimates that are less
biased. One downside to REML is that the likelihood ratio test cannot
be used to compare two models with different fixed effects specifications.
In small samples with balanced data, REML is generally preferable
to ML because it is unbiased. In large samples, however, differences
between estimates are neglible (Snijders and Bosker 1999). Thus, in
most applications, {}``the question of which method to use remains
a matter of personal taste'' (StataCorp 2005, pg. 188).
The remainder of this document provides syntax for estimating multilevel
models using SPSS, Stata, SAS, and R. The data analyzed will be the High
School and Beyond (HSB) dataset that accompanies the HLM package (Raudenbush
et al. 2005). Each section will show how to estimate the empty model,
a random intercept model, and a random slope model from the student
performance example outlined above. The dependent variable is scores
on a math achievement scale. Note that whereas HLM requires two separate
data files (one corresponding to each level), SPSS, Stata, SAS, and R
rely on only a single file. The level-2 observations are common to
each case within the same macro-unit, so that if there are 50 students
in one school the corresponding school-level score appears 50 times..
Each program also requires an id variable identifying the group membership
of each individual. The results presented below are based on REML
estimation, the default in each package.



