Estimation

An essential step to estimating multilevel models is the estimation of variance components. Up until the 1970s, the literature on variance component estimation focused on using ANOVA techniques that derived from the work of Fisher [adapted to unbalanced data by Henderson (1953)]. Since the 1970s, Full and Restricted Maximum Likelihood estimation (FML and REML, respectively) have become the preferred methods. ML approaches have several advantages, including the ability to handle unbalanced data without some of the pathologies of ANOVA methods (i.e. lack of uniqueness, negative variance estimates). Both FML and REML produce identical fixed effects estimates. The latter, however, takes into account the degrees of freedom from the fixed effects and thus produces variance components estimates that are less biased. One downside to REML is that the likelihood ratio test cannot be used to compare two models with different fixed effects specifications. In small samples with balanced data, REML is generally preferable to ML because it is unbiased. In large samples, however, differences between estimates are neglible (Snijders and Bosker 1999). Thus, in most applications, ``the question of which method to use remains a matter of personal taste'' (StataCorp 2005, pg. 188).

The remainder of this document provides syntax for estimating multilevel models using SPSS, Stata, and SAS. The data analyzed will be the High School and Beyond (HSB) dataset that accompanies the HLM package (Raudenbush et al. 2005). Each section will show how to estimate the empty model, a random intercept model, and a random slope model from the student performance example outlined above. The dependent variable is scores on a math achievement scale. Note that whereas HLM requires two separate data files (one corresponding to each level), SPSS, Stata, and SAS rely on only a single file. The level-2 observations are common to each case within the same macro-unit, so that if there are 50 students in one school the corresponding school-level score appears 50 times.. Each program also requires an id variable identifying the group membership of each individual. The results presented below are based on REML estimation, the default in each package.


Up: Notation for Mixed and Multilevel Models
Next: SPSS