Estimation
An essential step to estimating multilevel models is the estimation
of variance components. Up until the 1970s, the literature on
variance component estimation focused on using ANOVA techniques that
derived from the work of Fisher [adapted to unbalanced data by
Henderson (1953)]. Since the 1970s, Full and Restricted Maximum
Likelihood estimation (FML and REML, respectively) have become the
preferred methods. ML approaches have several advantages, including
the ability to handle unbalanced data without some of the
pathologies of ANOVA methods (i.e. lack of uniqueness, negative
variance estimates). Both FML and REML produce identical fixed
effects estimates. The latter, however, takes into account the
degrees of freedom from the fixed effects and thus produces variance
components estimates that are less biased. One downside to REML is
that the likelihood ratio test cannot be used to compare two models
with different fixed effects specifications. In small samples with
balanced data, REML is generally preferable to ML because it is
unbiased. In large samples, however, differences between estimates
are neglible (Snijders and Bosker 1999). Thus, in most
applications, ``the question of which method to use remains a matter
of personal taste'' (StataCorp 2005, pg. 188).
The remainder of this document provides syntax for estimating
multilevel models using SPSS, Stata, and SAS. The data analyzed
will be the High School and Beyond (HSB) dataset that accompanies
the HLM package (Raudenbush et al. 2005). Each section will show
how to estimate the empty model, a random intercept model, and a
random slope model from the student performance example outlined
above. The dependent variable is scores on a math achievement scale.
Note that whereas HLM requires two separate data files (one
corresponding to each level), SPSS, Stata, and SAS rely on only a
single file. The level-2 observations are common to each case within
the same macro-unit, so that if there are 50 students in one school
the corresponding school-level score appears 50 times.. Each program
also requires an id variable identifying the group membership of
each individual. The results presented below are based on REML
estimation, the default in each package.



