6. Comparison Using the One-way ANOVA, GLM, and Regression
The t-test is a special case of the one-way ANOVA. ANOVA examines mean differences using the F statistic, whereas the t-test reports the t statistic. Therefore, the t-test and one-way ANOVA produces the same answer. The linear regression model, often called ordinary least squares (OLS), reports the mean difference as the coefficient of the dummy variable (grouping variable). This section suggests that the t-test, one-way ANOVA, GLM, and linear regression present essentially the same thing in different ways.
For comparison, let us first replicate the independent sample t-test for death rates from lung cancer presented in 4.3. The x and y respectively denote light and heavy cigarette consuming states.
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
x | 22 16.98591 .6747158 3.164698 15.58276 18.38906
y | 22 22.32045 .7287523 3.418151 20.80493 23.83598
---------+--------------------------------------------------------------------
combined | 44 19.65318 .6374133 4.228122 18.36772 20.93865
---------+--------------------------------------------------------------------
diff | -5.334545 .9931371 -7.338777 -3.330314
------------------------------------------------------------------------------
diff = mean(x) - mean(y) t = -5.3714
Ho: diff = 0 degrees of freedom = 42
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
In SAS, ANOVA procedure performs ANOVA on balanced data. The CLASS statement specifies categorical variables and the MODEL statement specifies a variable to be compared and a grouping variable in an equation form.
CLASS smoke;
MODEL lung=smoke;
RUN;
Dependent Variable: lung
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 1 313.0311273 313.0311273 28.85 <.0001
Error 42 455.6804273 10.8495340
Corrected Total 43 768.7115545
R-Square Coeff Var Root MSE lung Mean
0.407215 16.75995 3.293863 19.65318
Source DF Anova SS Mean Square F Value Pr > F
smoke 1 313.0311273 313.0311273 28.85 <.0001
The degrees of freedom in the t-test are the same as the degrees of freedom of error in the ANOVA table. The F statistic 28.85 with 1 degree of freedom is t squared, (-5.3714)^2. Accordingly, their p-values are identical. Stata .anova and .oneway commands produce the same result.
Root MSE = 3.29386 Adj R-squared = 0.3931
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
Model | 313.031127 1 313.031127 28.85 0.0000
|
smoke | 313.031127 1 313.031127 28.85 0.0000
|
Residual | 455.680427 42 10.849534
-----------+----------------------------------------------------
Total | 768.711555 43 17.8770129
In SPSS, the ONEWAY command conducts the same one-way ANOVA.
6.2 Generalized Linear Model (GLM)
GLM can handle both balanced and unbalanced data. Like ANOVA, GLM produces the same result of the one-way ANOVA. The SAS GLM and MIXED procedures report the F statistic for the one-way ANOVA. However, Stata’s .glm command does not perform the one-way ANOVA.
CLASS smoke;
MODEL lung=smoke /SS3;
RUN;
Dependent Variable: lung
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 1 313.0311273 313.0311273 28.85 <.0001
Error 42 455.6804273 10.8495340
Corrected Total 43 768.7115545
R-Square Coeff Var Root MSE lung Mean
0.407215 16.75995 3.293863 19.65318
Source DF Type III SS Mean Square F Value Pr > F
smoke 1 313.0311273 313.0311273 28.85 <.0001
The MIXED procedure has the similar usage as the GLM procedure.
In SPSS, the GLM (or UNIANOVA) commands perform the one-way ANOVA. You may not exclude the intercept in these commands. The MIXED command also fits this model.
/METHOD = SSTYPE(3)
/INTERCEPT = INCLUDE
/CRITERIA = ALPHA(.05)
/DESIGN = smoke .
MIXED lung BY smoke
/FIXED = smoke |SSTYPE(3)
/METHOD = REML
6.3 Linear Regression (Ordinary Least Squares)
The linear regression or OLS give you the same answer that the t-test and the one-way ANOVA produced. In OLS, only a dummy variable (grouping variable) is included in the right-hand side. Both OLS and ANOVA use the same covariance structure in analysis. Hence, these three methods present the same result in their own ways.
The SAS REG procedure, Stata .regress command, and SPSS REGRESSION command produce the same output of a linear regression model with only a dummy for the grouping variable smoke.
PROC REG DATA=masil.smoking;
MODEL lung=smoke;
RUN;
Model: MODEL1
Dependent Variable: lung
Number of Observations Read 44
Number of Observations Used 44
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 1 313.03113 313.03113 28.85 <.0001
Error 42 455.68043 10.84953
Corrected Total 43 768.71155
Root MSE 3.29386 R-Square 0.4072
Dependent Mean 19.65318 Adj R-Sq 0.3931
Coeff Var 16.75995
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 16.98591 0.70225 24.19 <.0001
smoke 1 5.33455 0.99314 5.37 <.0001
The coefficient of the intercept 16.9859 is the mean of the first group (smoke=0). The group coded as zero is the baseline of comparison. The coefficient of smoke is the mean difference between two groups (5.33455=22.3205-16.9859). The average death rate of heavy cigarette consuming states is about 5.3346 higher than that of the light counterparts.
The standard error of the coefficient of smoke is the denominator of the independent sample t-test, .9931=3.2939*sqrt(1/22+1/22), where the pooled variance estimate is 10.8497=3.2939^2 (see 4.3). Thus, the t 5.37 for the dummy coefficient in OLS is identical to the t statistic for the independent samples with equal variances.
The following is an example of the Stata .regress command. A dependent variable precedes a list of independent variables. _cons in the output means the intercept term.
-------------+------------------------------ F( 1, 42) = 28.85
Model | 313.031127 1 313.031127 Prob > F = 0.0000
Residual | 455.680427 42 10.849534 R-squared = 0.4072
-------------+------------------------------ Adj R-squared = 0.3931
Total | 768.711555 43 17.8770129 Root MSE = 3.2939
------------------------------------------------------------------------------
lung | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
smoke | 5.334545 .9931371 5.37 0.000 3.330314 7.338777
_cons | 16.98591 .702254 24.19 0.000 15.5687 18.40311
------------------------------------------------------------------------------
The SPSS REGRESSION command looks complicated compared to the SAS REG procedure and Stata .regress command. You need to specify a dependent variable in the /DEPENDENT subcommand and a list of independent variables in /METHOD.
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT lung
/METHOD=ENTER smoke.
Table 5 compares three methods when comparing two independent samples. Although they presents different statistics but reach the same conclusion. ANOVA, GLM, and OLS report the same F (1, 42) of 28.85, which is equivalent to t (42) of -5.3714 since 28.85=(-5.3714)^2. In OLS, the intercept is sample mean of the baseline group coded as zero, while the dummy coefficient is the mean difference of two samples. Hence, the t-test, ANOVA, and OLS are the same test when comparing the means of two independent samples. However, the t-test is recommended for comparing group means to take advantage of simple computation and easy interpretation.
Up: Table of Contents
Next: Comparing the Proportions of Binary Variables
Prev: Comparing Independent Samples with Unequal Variances



