SAS

This section follows Singer (1998); a thorough treatment is available from Littell et al. (2006). The SAS procedure for estimating multilevel models is PROC MIXED.

In the HSB data file the student-level SES variable is in its original metric (a standardized scale with a mean of zero). Oftentimes, the researcher will prefer to center a variable around the mean of all observations within the same group. Group-mean centering in SAS is accomplished using the SQL procedure. The following commands create a new data file, HSB2, in the Work library that includes two additional variables: the group means for the SES variable (saved as the variable sesmeans) and the group-mean centered SES variable cses:

PROC SQL;
CREATE TABLE hsb2 AS
SELECT *, mean(ses) as meanses,
ses-mean(ses) AS cses
FROM hsb
GROUP BY id;
QUIT;

(Grand-mean centering also uses PROC SQL. Excluding the GROUP BY statement causes the mean(ses) function to estimate the grand mean for the ses variable. The ses-mean(ses) statement then creates the grand-mean centered variable.)

The syntax for estimating the empty model is the following:

PROC MIXED COVTEST DATA=hsb2;
CLASS id;
MODEL mathach = /SOLUTION;
RANDOM intercept/SUBJECT=id
RUN;

The COVTEST option requests hypothesis tests for the random effects. The CLASS statement identifies id as a categorical variable. The MODEL statement defines the model, which in this case does not include any predictor variables, and the SOLUTION option asks SAS to print the fixed effects estimates in the output. The next statement, RANDOM, identifies the elements of the model to be specified as random effects. The SUBJECT=id option identifies id to be the grouping variable.

The results are displayed in Table 3 at the bottom of the page. The average math achievement score across all schools is 12.6370. The variance component corresponding to the random intercept is 8.6097, which has a corresponding standard error of 1.0778. Because this estimate is more than twice the size of its standard error, there is evidence of significant variation in average test scores across schools (though see the SPSS section for a caution on over-interpreting this test).

It is possible to partition the variance in the dependent variable across levels according to the ratio of the school-level variance component to the total variance. In this example, the ratio is 8.6097/(8.6097+39.1487) = .1802761, meaning that roughly 18% of the variance is attributable to school characteristics.

In order to explain some of the school-level variation in math achievement scores it is possible to incorporate school-level predictors into the model. For example, the average socioeconomic status of a school's students may affect performance. In addition, whether a school is public or private may also make a difference. The SAS program for a model with two school level predictors is the following:

PROC MIXED COVTEST DATA=hsb2;
CLASS id;
MODEL mathach = meanses sector /SOLUTION;
RANDOM intercept/SUBJECT=id;
RUN;

The MODEL statement now includes the two school-level predictors following the equals sign. Nothing else is changed from the previous program.

The results are displayed in the second column of Table 3. The intercept is 12.1282, which now corresponds to the expected math achievement score for a student in a public school at that school's average SES level. A one-unit increase in the school's average SES score is associated with a 5.3328-unit increase in expected math achievement, and moving from a public to a private school is associated with an expected improvement of 1.2254. These estimates are all significant.

The variance component corresponding to the random intercept has now dropped to 2.3139, demonstrating that the inclusion of the average SES and school sector variables explains a good deal of the school-level variance. Still, the estimate remains more than twice the size of its standard error of 0.3700, suggesting that some of the school-level variance remains unexplained.

A final model adds a student-level covariate, the group-mean centered SES variable. Because it is possible that the effect of a student's SES may vary across schools, the final model treats the slope as random. Additionally, because the slope may vary according to school-level characteristics such as average SES and sector (private versus public), the final model also incorporates cross-level interactions.

The syntax for this last model is the following:

PROC MIXED COVTEST DATA=hsb2;
CLASS id;
MODEL mathach = meanses sector cses meanses*cses sector*cses/solution;
RANDOM intercept cses / TYPE=UN SUB=id;
RUN;

The MODEL statement adds the cses variable along with the cross-level interactions between cses> at the student level and sector and meanses at the school level. CSES is also added to the RANDOM statement. The TYPE=UN option specifies an unstructured covariance matrix for the random effects.

The results are displayed in the final column of Table 3. The intercept of 12.1279 now refers to the expected math achievement score in a public school with average SES scores for a student at his or her school's average SES level. Because there are interactions in the model, the marginal fixed effects of each variable depend on the value of the other variable(s) involved in the interaction. The marginal effect of a one-unit change in a student's SES score on math achievement will depend on whether a school is public or private as well as on the school's average SES score. For a public school (where sector=0), the marginal effect of a one-unit change in the group-mean centered student SES variable is equal to = γ10 + γ11(MEANSES) = 2.9450 + 1.0392(MEANSES). For a private school (where sector=1), the marginal effect of a one-unit change in a student's SES is equal to 1011(MEANSES) + γ12 = 2.9450 + 1.0392(MEANSES) - 1.6427. When cross-level interactions are present, graphical means may be appropriate for exploring the contingent nature of marginal effects in greater detail. Here the simplest interpretation of the interaction coefficients is that the effect of student-level SES is significantly higher in wealthier schools and significantly lower in private schools.

The variance component corresponding to the random intercept is 2.3794, which remains much larger than its standard error of .3714. Thus there is most likely additional school-level variation unaccounted for in the model. The variance component for the random slope is smaller than its standard error, however, suggesting that the model picks up most of the variance in this slope that exists across schools.



Up: Stata
Next: R