4. Comparing Independent Samples with Equal Variances

This section discusses the most typical form of t-test that compares the means of two independent random samples y1 and y2. They are independent in the sense that they are drawn from different populations and each element of one sample is not paired (linked to) with its corresponding element of the other sample.

An example is the death rate from lung cancer between heavy cigarette consuming states and light consuming states. Since each state is either a heavy or light consumer, observations of two groups are not linked. The typical null hypothesis of the independent sample t-test is that the mean difference of the two groups is zero.

4.1 F test for Equal Variances

T-tests assume that samples are randomly drawn from normally distributed populations with unknown parameters. In addition to these random sampling and normality assumptions, you should check the equal variance assumption when examining the mean difference of two independent samples. The population variances of the two groups need to be equal in order to use the pooled variance. Otherwise, the t-test is not reliable due to the incorrect variance and degrees of freedom used.

In practice, unequal variances of two independent samples are less problematic when two samples have the same number of observations (balanced data) (Hildebrand et al. 2005: 362). The problem will be critical if one sample has a larger variance and a much smaller sample size compared to the other (362).

The folded form F-test is commonly used to examine whether two populations have the same variance. The F statistic is

   T statistic

where L and S respectively indicate groups with larger and smaller sample variances.

The SAS TTEST procedure and SPSS T-TEST command conduct the F-test for equal variances. SAS reports the folded form F statistic, whereas SPSS computes Levene's weighted F statistic. In Stata, the .oneway command performs the Bartlett’s chi-squared test for equal variances.

The following is an example of the F-test using the .oneway command. The chi-squared statistic of .1216 (p<.727) does not reject the null hypothesis of equal variances at the .05 significance level. Two samples appear to have the equal variance. The F statistic of 28.85 in the ANOVA table indicates a significant mean difference in the death rate from lung cancer between heavy cigarette consuming states and light consumers (p<.0000). Do not be confused with the two different F statistics.

. oneway lung smoke
                       Analysis of Variance
   Source              SS         df      MS            F     Prob > F
------------------------------------------------------------------------
Between groups      313.031127      1   313.031127     28.85     0.0000
 Within groups      455.680427     42    10.849534
------------------------------------------------------------------------
    Total           768.711555     43   17.8770129

Bartlett's test for equal variances:  chi2(1) =   0.1216  Prob>chi2 = 0.727

Top

4.2 Overview of the Independent Sample T-test

If the null hypothesis of equal variances is not rejected, the pooled variance can be used. The pooled variance consists of individual sample variances weighted by the number of observations of the two groups. The null hypothesis of the independent sample t-test is no difference in populaiton means and the degrees of freedom are n1+n2–2 = (n1–1) + (n2–1). The t statistic is computed as follows.

   T statistic

When the equal variance assumption is violated, the t-test needs to use individual variances in the approximate t and the degrees of freedom. This test may be called the unequal variance t-test (Hildebrand et al. 2005: 363). Notice that the approximation below is based both on the number of observations and variances of two independent samples. The approximate t is

   

In case of unequal variances, the t-test requires the approximation of the degrees of freedom (Satterthwaite 1946; Welch 1947; Cochran and Cox 1992; SAS 2005; Stata 2007). Among several approximation methods, Satterthwaite’s approximation is commonly used. Note that the approximation is a real number, not necessarily an integer. SAS, Stata, and SPSS all compute Satterthwaite’s approximation of the degrees of freedom. In addition, the SAS TTEST procedure reports the Cochran-Cox’s approximation and the Stata .ttest command provides Welch’s approximation as well.

This section discusses the independent sample t-test when the samples have an equal variance. The t-test for the samples with unequal variances is discussed in the next section.

Top

4.3 Independent Sample T-test in STATA

In the .ttest command, you have to specify a grouping variable using the by option. This command presents summary statistics of individual samples, the combined, and the difference of paired observations. Here you need to pay attention to the first two lines of the summary. Light cigarette consuming states (smoke=0) have a smaller mean (16.9859) and standard deviation (3.1647) than heavy consuming states. Both groups have the same sample size of 22.

. ttest lung, by(smoke) level(95)
Two-sample t test with equal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
       0 |      22    16.98591    .6747158    3.164698    15.58276    18.38906
       1 |      22    22.32045    .7287523    3.418151    20.80493    23.83598
---------+--------------------------------------------------------------------
combined |      44    19.65318    .6374133    4.228122    18.36772    20.93865
---------+--------------------------------------------------------------------
    diff |           -5.334545    .9931371               -7.338777   -3.330314
------------------------------------------------------------------------------
    diff = mean(0) - mean(1)                                      t =  -5.3714
Ho: diff = 0                                     degrees of freedom =       42

   Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

Let us first check the equal variance. The F statistic is 1.1666=3.4182^2/3.1647^2~F(21,21). The degrees of freedom of the numerator and denominator are 21 (=22-1). The p-value .7273, virtually the same as that of Bartlett’s test in 4.1, does not reject the null hypothesis of equal variances. Thus, the independent sample t-test can use the pooled variance as follows.

The t statistic -5.3714 is large sufficiently to reject the null hypothesis of no mean difference between two groups (p<.0000). Light cigarette consuming states have a lower average death rate from lung cancer than the heavy counterparts. Notice that the F statistic of 28.85 in section 4.1 is t squared = (-5.3714)^2.

If only aggregated data of the two variables are available, use the .ttesti command, the immediate form of .ttest, and list the number of observations, mean, and standard deviation of the two variables.

. ttesti 22 16.85591 3.164698 22 22.32045 3.418151, level(95)

Suppose the data set is arranged in the second type of Figure 3 so that one variable high_lung has data for heavy cigarette consuming states and the other low_lung for light consuming states. You have to use the unpaired option to indicate that two variables are not paired. A grouping variable here is not necessary.

. ttest high_lung=low_lung, unpaired
Two-sample t test with equal variances
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
high_lung|      22    22.32045    .7287523    3.418151    20.80493    23.83598
low_lung |      22    16.98591    .6747158    3.164698    15.58276    18.38906
---------+--------------------------------------------------------------------
combined |      44    19.65318    .6374133    4.228122    18.36772    20.93865
---------+--------------------------------------------------------------------
    diff |            5.334545    .9931371                3.330313    7.338777
------------------------------------------------------------------------------
    diff = mean(high_lung) - mean(low_lung)                       t =   5.3714
Ho: diff = 0                                     degrees of freedom =       42

   Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
Pr(T < t) = 1.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 0.0000

Since the variable order is reversed here, the summary statistics of heavy cigarette consumers are displayed first and t statistic has the opposite sign. However, this outcome leads to the same conclusion. The large test statistic of 5.3714 rejects the null hypothesis at the .05 level; heavy cigarette consuming states on average have a higher average death rate from lung cancer than light consumers.

The unpaired option is very useful since it enables you to conduct a t-test without additional data manipulation. You need to use the unpaired option to compare two variables, say leukemia and kidney, as independent samples in Stata.

. ttest leukemia=kidney, unpaired

Two-sample t test with equal variances
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
leukemia |      44    6.829773    .0962211    .6382589    6.635724    7.023821
  kidney |      44    2.794545    .0782542    .5190799    2.636731     2.95236
---------+--------------------------------------------------------------------
combined |      88    4.812159    .2249261    2.109994    4.365094    5.259224
---------+--------------------------------------------------------------------
    diff |            4.035227    .1240251                3.788673    4.281781
------------------------------------------------------------------------------
    diff = mean(leukemia) - mean(kidney)                          t =  32.5356
Ho: diff = 0                                     degrees of freedom =       86

   Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
Pr(T < t) = 1.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 0.0000

The average death rate from leukemia cancer is 6.8298, which is about 4 higher than the average rate from kidney cancer. But we want to know if there is any statistically significant difference in the population means of two death rates. The F 1.5119 = (.6382589^2) / (.5190799^2) and its p-value (=.1794) do not reject the null hypothesis of equal variances. The large t statistic 32.5356 rejects the null hypothesis that average death rates from leukemia and kidney cancers have the same mean at the .05 level; the average death rate from leukemia cancer is higher than that from kidney cancer.

Top

4.4 Independent Sample T-test in SAS

The TTEST procedure by default examines the hypothesis of equal variances, and then provides t statistics for both cases. The procedure by default reports Satterthwaite’s approximation for the degrees of freedom. SAS requires that a data set is arranged in the first type of Figure 3 for the independent sample t-test; a variable to be tested is classified by a grouping variable, which should be specified in the CLASS statement. You may specify a hypothesized value other than zero using the H0 option.

PROC TTEST H0=0 ALPHA=.05 DATA=masil.smoking;
   CLASS smoke;
   VAR lung;
RUN;

TTEST displays summary statistics of two samples and then reports the results of the t-test and F-test. First, look at the last block of the output entitled as “Equality of Variances.” The labels “Num DF” and “Den DF” are respectively numerator’s and denominator’s degrees of freedom.

                                   The TTEST Procedure
 
                                        Statistics
 
                                 Lower CL          Upper CL  Lower CL           Upper CL
  Variable  smoke             N      Mean    Mean      Mean   Std Dev  Std Dev   Std Dev
 
  lung                 0     22    15.583  16.986    18.389    2.4348   3.1647    4.5226
  lung                 1     22    20.805   22.32    23.836    2.6298   3.4182    4.8848
  lung      Diff (1-2)             -7.339  -5.335     -3.33    2.7159   3.2939    4.1865
 
                                       Statistics
 
                  Variable  smoke         Std Err    Minimum    Maximum
 
                  lung                 0   0.6747      12.01      25.45
                  lung                 1   0.7288      12.11      27.27
                  lung      Diff (1-2)     0.9931
 
 
                                         T-Tests
 
          Variable    Method           Variances      DF    t Value    Pr > |t|
 
          lung        Pooled           Equal          42      -5.37      <.0001
          lung        Satterthwaite    Unequal      41.8      -5.37      <.0001
 
 
                                  Equality of Variances
 
              Variable    Method      Num DF    Den DF    F Value    Pr > F
 
              lung        Folded F        21        21       1.17    0.7273

The small F statistic 1.17 and its large p-value do not reject the null hypothesis of equal variances (p<.7273). The t-test uses the pooled variance, thus you need to read the line labeled as “Pooled”; otherwise, read the lines of “Satterthwaite” and “Cochran.” The large t -5.37 and its p-value .0001 reject the null hypothesis. Heavy cigarette consuming states have a higher average death rate from lung cancer than light consuming states.

If you have a summary data set with the values of variables lung and their frequency count, specify the count variable in the FREQ statement.

PROC TTEST DATA=masil.smoking;
   CLASS smoke;
   VAR lung;
   FREQ count;
RUN;

Now, let us compare the death rates from leukemia and kidney by stacking up two variables into one and generating a grouping variable. The following DATA step reshapes the data set from the second type of Figure 3 to the first type. The new variable rate contains death rates and leu_kid identifies groups.

DATA masil.smoking2;
   SET masil.smoking;
   rate = leukemia; leu_kid ='Leukemia'; OUTPUT;
   rate = kidney; leu_kid ='Kidney'; OUTPUT;
   KEEP leu_kid rate;
RUN;

PROC TTEST COCHRAN DATA=masil.smoking2;
   CLASS leu_kid;
   VAR rate;
RUN;

TTEST presents summary statistics and then conducts the t-test and F-test. The F 1.51 does not reject the null hypothesis of equal variances at the .05 level (p<.1794). Accordingly, you have to read the t statistic and p-value in the line labeled as “Pooled.” The t -32.54 is large sufficiently to reject the null hypothesis (p<.0001). In this example, the violation of the equal variance assumption does not make any difference in t statistic and p-value. Note that Cochran approximation for the degrees of freedom is generated by the COCHRAN option in the TTEST statement.

                                      The TTEST Procedure
 
                                          Statistics
 
                             Lower CL          Upper CL  Lower CL           Upper CL
Variable  leu_kid         N      Mean    Mean      Mean   Std Dev  Std Dev   Std Dev  Std Err
 
rate      Kidney         44    2.6367  2.7945    2.9524    0.4289   0.5191    0.6577   0.0783
rate      Leukemia       44    6.6357  6.8298    7.0238    0.5273   0.6383    0.8087   0.0962
rate      Diff (1-2)           -4.282  -4.035    -3.789    0.5063   0.5817    0.6838    0.124
 
 
                                            T-Tests
 
             Variable    Method           Variances      DF    t Value    Pr > |t|
 
             rate        Pooled           Equal          86     -32.54      <.0001
             rate        Satterthwaite    Unequal      82.6     -32.54      <.0001
             rate        Cochran          Unequal        43     -32.54      <.0001
 
 
                                     Equality of Variances
 
                 Variable    Method      Num DF    Den DF    F Value    Pr > F
 
                 rate        Folded F        43        43       1.51    0.1794

Top

4.5 Independent Sample T-test in SPSS

In the T-TEST command, you need to provide a grouping variable in the /GROUP subcommand. The Levene's F statistic of .0000 does not reject the null hypothesis of equal variances (p<.995). You may click Analyze--> Compare Means--> Independent-Samples T Test and then provide variables to be compared and a grouping variable with specific values specified.

T-TEST
   /GROUPS = smoke(0 1)
   /VARIABLES = lung
   /CRITERIA = CI(.95) .

Table 3 compares the independent sample t-test under the equal variance assumption using three statistical software packages. For testing the assumption of equal variances, Stata, SAS, and SPSS respectively report the Bartlett’s test (chi-squared), the folded form F test, and Levene’s F test. Despite different test statistics in Stata and SAS, their p-values are almost the same. Stata and SAS respectively produce Welch’s and Cochran-Cox’s approximations of the degrees of freedom in addition to the Satterthwaite’s approximation. Three software packages report the same result. It is not surprising that t and approximate t (t’) are the same because two population variances are equal.

Table 3


Up: Table of Contents
Next: Comparing Independent Samples with Unequal Variances
Prev: Paired T-test