6. Comparison Using the One-way ANOVA, GLM, and Regression

The t-test is a special case of the one-way ANOVA. ANOVA examines mean differences using the F statistic, whereas the t-test reports the t statistic. Therefore, the t-test and one-way ANOVA produces the same answer. The linear regression model, often called ordinary least squares (OLS), reports the mean difference as the coefficient of the dummy variable (grouping variable). This section suggests that the t-test, one-way ANOVA, GLM, and linear regression present essentially the same thing in different ways.

For comparison, let us first replicate the independent sample t-test for death rates from lung cancer presented in 4.3. The x and y respectively denote light and heavy cigarette consuming states.

. ttesti 22 16.85591 3.164698 22 22.32045 3.418151
Two-sample t test with equal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
       x |      22    16.98591    .6747158    3.164698    15.58276    18.38906
       y |      22    22.32045    .7287523    3.418151    20.80493    23.83598
---------+--------------------------------------------------------------------
combined |      44    19.65318    .6374133    4.228122    18.36772    20.93865
---------+--------------------------------------------------------------------
    diff |           -5.334545    .9931371               -7.338777   -3.330314
------------------------------------------------------------------------------
    diff = mean(x) - mean(y)                                      t =  -5.3714
Ho: diff = 0                                     degrees of freedom =       42

   Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

6.1 One-way ANOVA

In SAS, ANOVA procedure performs ANOVA on balanced data. The CLASS statement specifies categorical variables and the MODEL statement specifies a variable to be compared and a grouping variable in an equation form.

PROC ANOVA DATA=masil.smoking;
   CLASS smoke;
   MODEL lung=smoke;
RUN;
                                   The ANOVA Procedure
 
Dependent Variable: lung
 
                                           Sum of
   Source                      DF         Squares     Mean Square    F Value    Pr > F
 
   Model                        1     313.0311273     313.0311273      28.85    <.0001
   Error                       42     455.6804273      10.8495340
   Corrected Total             43     768.7115545
 
 
                    R-Square     Coeff Var      Root MSE     lung Mean
 
                    0.407215      16.75995      3.293863      19.65318
 
   Source                      DF        Anova SS     Mean Square    F Value    Pr > F
   smoke                        1     313.0311273     313.0311273      28.85    <.0001

The degrees of freedom in the t-test are the same as the degrees of freedom of error in the ANOVA table. The F statistic 28.85 with 1 degree of freedom is t squared, (-5.3714)^2. Accordingly, their p-values are identical. Stata .anova and .oneway commands produce the same result.

. anova lung smoke
                           Number of obs =      44     R-squared     =  0.4072
                           Root MSE      = 3.29386     Adj R-squared =  0.3931
 
                  Source |  Partial SS    df       MS           F     Prob > F
              -----------+----------------------------------------------------
                   Model |  313.031127     1  313.031127      28.85     0.0000
                         |
                   smoke |  313.031127     1  313.031127      28.85     0.0000
                         |
                Residual |  455.680427    42   10.849534  
              -----------+----------------------------------------------------
                   Total |  768.711555    43  17.8770129  

In SPSS, the ONEWAY command conducts the same one-way ANOVA.

ONEWAY lung BY smoke.

Top

6.2 Generalized Linear Model (GLM)

GLM can handle both balanced and unbalanced data. Like ANOVA, GLM produces the same result of the one-way ANOVA. The SAS GLM and MIXED procedures report the F statistic for the one-way ANOVA. However, Stata’s .glm command does not perform the one-way ANOVA.

PROC GLM DATA=masil.smoking;
   CLASS smoke;
   MODEL lung=smoke /SS3;
RUN;
                                    The GLM Procedure
 
Dependent Variable: lung
 
                                           Sum of
   Source                      DF         Squares     Mean Square    F Value    Pr > F
 
   Model                        1     313.0311273     313.0311273      28.85    <.0001
   Error                       42     455.6804273      10.8495340
   Corrected Total             43     768.7115545
 
 
                    R-Square     Coeff Var      Root MSE     lung Mean
 
                    0.407215      16.75995      3.293863      19.65318
 
 
   Source                      DF     Type III SS     Mean Square    F Value    Pr > F
 
   smoke                        1     313.0311273     313.0311273      28.85    <.0001

The MIXED procedure has the similar usage as the GLM procedure.

PROC MIXED; CLASS smoke; MODEL lung=smoke; RUN;

In SPSS, the GLM (or UNIANOVA) commands perform the one-way ANOVA. You may not exclude the intercept in these commands. The MIXED command also fits this model.

GLM lung BY smoke
   /METHOD = SSTYPE(3)
   /INTERCEPT = INCLUDE
   /CRITERIA = ALPHA(.05)
   /DESIGN = smoke .

MIXED lung BY smoke
   /FIXED = smoke |SSTYPE(3)
   /METHOD = REML

Top

6.3 Linear Regression (Ordinary Least Squares)

The linear regression or OLS give you the same answer that the t-test and the one-way ANOVA produced. In OLS, only a dummy variable (grouping variable) is included in the right-hand side. Both OLS and ANOVA use the same covariance structure in analysis. Hence, these three methods present the same result in their own ways.

The SAS REG procedure, Stata .regress command, and SPSS REGRESSION command produce the same output of a linear regression model with only a dummy for the grouping variable smoke.

PROC REG DATA=masil.smoking;
   MODEL lung=smoke;
RUN;

                                       The REG Procedure
                                         Model: MODEL1
                                   Dependent Variable: lung
 
                            Number of Observations Read          44
                            Number of Observations Used          44
 
                                      Analysis of Variance
 
                                             Sum of           Mean
         Source                   DF        Squares         Square    F Value    Pr > F
 
         Model                     1      313.03113      313.03113      28.85    <.0001
         Error                    42      455.68043       10.84953
         Corrected Total          43      768.71155
 
 
                      Root MSE              3.29386    R-Square     0.4072
                      Dependent Mean       19.65318    Adj R-Sq     0.3931
                      Coeff Var            16.75995
 
 
                                      Parameter Estimates
 
                                   Parameter       Standard
              Variable     DF       Estimate          Error    t Value    Pr > |t|
 
              Intercept     1       16.98591        0.70225      24.19      <.0001
              smoke         1        5.33455        0.99314       5.37      <.0001

The coefficient of the intercept 16.9859 is the mean of the first group (smoke=0). The group coded as zero is the baseline of comparison. The coefficient of smoke is the mean difference between two groups (5.33455=22.3205-16.9859). The average death rate of heavy cigarette consuming states is about 5.3346 higher than that of the light counterparts.

The standard error of the coefficient of smoke is the denominator of the independent sample t-test, .9931=3.2939*sqrt(1/22+1/22), where the pooled variance estimate is 10.8497=3.2939^2 (see 4.3). Thus, the t 5.37 for the dummy coefficient in OLS is identical to the t statistic for the independent samples with equal variances.

The following is an example of the Stata .regress command. A dependent variable precedes a list of independent variables. _cons in the output means the intercept term.

. regress lung smoke
      Source |       SS       df       MS              Number of obs =      44
-------------+------------------------------           F(  1,    42) =   28.85
       Model |  313.031127     1  313.031127           Prob > F      =  0.0000
    Residual |  455.680427    42   10.849534           R-squared     =  0.4072
-------------+------------------------------           Adj R-squared =  0.3931
       Total |  768.711555    43  17.8770129           Root MSE      =  3.2939
 
------------------------------------------------------------------------------
        lung |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       smoke |   5.334545   .9931371     5.37   0.000     3.330314    7.338777
       _cons |   16.98591    .702254    24.19   0.000      15.5687    18.40311
------------------------------------------------------------------------------

The SPSS REGRESSION command looks complicated compared to the SAS REG procedure and Stata .regress command. You need to specify a dependent variable in the /DEPENDENT subcommand and a list of independent variables in /METHOD.

REGRESSION
   /MISSING LISTWISE
   /STATISTICS COEFF OUTS R ANOVA
   /CRITERIA=PIN(.05) POUT(.10)
   /NOORIGIN
   /DEPENDENT lung
   /METHOD=ENTER smoke.

Table 5 compares three methods when comparing two independent samples. Although they presents different statistics but reach the same conclusion. ANOVA, GLM, and OLS report the same F (1, 42) of 28.85, which is equivalent to t (42) of -5.3714 since 28.85=(-5.3714)^2. In OLS, the intercept is sample mean of the baseline group coded as zero, while the dummy coefficient is the mean difference of two samples. Hence, the t-test, ANOVA, and OLS are the same test when comparing the means of two independent samples. However, the t-test is recommended for comparing group means to take advantage of simple computation and easy interpretation.

Table 5


Up: Table of Contents
Next: Comparing the Proportions of Binary Variables
Prev: Comparing Independent Samples with Unequal Variances