The t-test is a basic statistical method for examining the mean difference between two groups. The one-way ANOVA can compare means of more than two groups. T-tests also compare the proportions of binary variables when sample size is large. Whether data are balanced or unbalanced does not matter in t-tests and one-way ANOVA. The one-way ANOVA, GLM, and linear regression model all use the variance-covariance structure in their analysis, but present equivalent results in different ways.
Here are key checklists for researchers who want to conduct t-tests. First, a variable to be tested should be interval or ratio-scaled so that its mean is substantively meaningful. Do not, for example, compare the means of skin colors (white=0, yellow=1, black=2) of children in two cities. In case of binary variables, the t-test compares the proportions of success of the variables. If you have a latent variable measured by several Likert-scaled manifest variables, first run a factor analysis to construct those latent variables before t-test.
Second, the data generation process (sampling and data collection) should be carefully explored to ensure that the samples were randomly drawn. If each observation is not independent of other observations and selection bias is involved in the sampling process, sophisticated methods need to be employed to deal with the non-randomness. In case of self-selection, for example, the propensity score matching appears to be a good candidate.
Researchers should also examine the normality assumption especially when N is small. It is awkward to compare the means of random variables that are not normally distributed. If N is not large and normality is questionable, conduct the Shapiro-Wilk W, Shapiro-Francia W, Kolmogorov-Smirnov D, or Jarque-Bera test. If the normality assumption is violated, try nonparametric methods such as the Kolmogorov-Smirnov Test and Wilcoxon Rank-Sum Test.
Table 7. Comparison of T-test Features of Stata, SAS, R and SPSS
|Stata11||SAS 9.2||R||SPSS 17|
|Test for equal variances||Bartlett's chi-squared|
|Folded form F
|Folded form F|
|Levene's weighted F
|Comparing means (T-test)||.test|
|Approximation of the degrees of freedom||Satterthwaite, Welch||Satterthwaite, Cochran-Cox||Satterthwaite||Satterthwaite|
|Hypothesized value anything other than 0 in H0||One sample t-test||H0||mu||One sample t-test|
|Data arrangement for the indepenent sample||Long and wide||Long||Long and wide||Long|
There are four types of t-tests. If you have a variable whose mean is compared to a hypothesized value, this is the case of the one sample t-test. If you have two variables and they are paired, that is, if each element of one sample is linked to its corresponding element of the other sample, conduct the paired t-test. You are checking if differences of individual pairs have mean 0 (no effect). If two independent variables are compared, run the independent sample t-test.
When comparing two independent samples, first check if the variances of two variables are equal by conducting the folded form F test. If their variances are not significantly different, you may use the pooled variance to compute standard error. If the equal variance assumption is violated, use individual variances and approximate degrees of freedom.
If you need to compare means of more than two groups, conduct the one-way ANOVA. See Figure 3 for summary of t-tests and one-way ANOVA.
Next, consider types of t-tests, data arrangement, and software issues to determine the best strategy for data analysis (Table 7). The long form of data arrangement in Figure 4 is commonly used for the independent sample t-test, whereas the wide data arrangement is appropriate for the paired sample t-test. If independent samples are arranged in the wide form in SAS and SPSS, you should reshape data into the long data arrangement form.
SAS has several procedures (e.g., TTEST, MEANS, and UNIVARIATE) and useful options for t-tests. For example, the HO option allows you to specify a hypothesized value other than zero. The Stata .ttest and .ttesti commands provide very flexible ways of handling different data arrangements and aggregated data. Table 8 summarizes usages of options in these commands.
Finally, report and interpret results clearly. You need to report mean difference (or group means), standard error, degrees of freedom, and N. Table 8 is a simple example of reporting t-test results. And then interpret results substantively rather than simply report numbers. For instance, "The average death rate from lung cancer of heavy cigarette consuming states is on average 5.335 higher than that of other states."
Table 9. Options of the .ttest Command and its Immediate Form of .ttesti
|Usage||by (group var)||unequal||welch||unpaired*|
|One sample t-test||var=c|
|Paired (dependent) sample||var1=var2|
|Equal variance (1 variable)||var||O|
|Equal variance (2 variable)**||var1=var2||O||O|
|Unequal variance (1 variable)||var||O||O||O|
|Unequal variance (2 variable)||var1=var2||O||O||O|
** The "var1=var2" assumes second data arrangement in Figure 3.
Table 10. Options of the t.ttest() Command in R
|One sample t-test||var||O|
|Paired (dependent) sample t-test||var1, var2||O||T|
|Equal variance (long form)||var~group||O||T|
|Equal variance (wide form)||var1, var2||O||T||F|
|Unequal variance (long form)||var~group||O||F|
|Unequal variane (wide form)||var||O||F||F|
APPENDIX: Data Set
- cigar = number of cigarettes smoked (hds per capita)
- bladder = deaths per 100k people from bladder cancer
- lung = deaths per 100k people from lung cancer
- kidney = deaths per 100k people from kidney cancer
- leukemia = deaths per 100k people from leukemia
- smoke = 1 for those whose cigarette consumption is larger than the median and 0 otherwise.
- west = 1 for states in the South or West and 0 for those in the North, East or Midwest.
cigar | 44 24.91409 5.573286 14 42.4
bladder | 44 4.121136 .9649249 2.86 6.54
lung | 44 19.65318 4.228122 12.01 27.27
kidney | 44 2.794545 .5190799 1.59 4.32
leukemia | 44 6.829773 .6382589 4.9 8.28
. sfrancia cigar-leukemia
Variable | Obs W' V' z Prob>z
cigar | 44 0.93061 3.258 2.203 0.01381
bladder | 44 0.94512 2.577 1.776 0.03789
lung | 44 0.97809 1.029 0.055 0.47823
kidney | 44 0.97732 1.065 0.120 0.45217
leukemia | 44 0.97269 1.282 0.474 0.31759
. tab west smoke
west | 0 1 | Total
0 | 7 13 | 20
1 | 15 9 | 24
Total | 22 22 | 44
- Bluman, Allan G. 2008. Elementary Statistics: A Step by Step Approach, A Brief Version, 4th ed. New York: McGraw Hill.
- Cochran, William G., and Gertrude M. Cox. 1992. Experimental Designs, 2nd ed. New York: John Wiley & Sons.
- Fraumeni, J. F. 1968. "Cigarette Smoking and Cancers of the Urinary Tract: Geographic Variations in the United States," Journal of the National Cancer Institute, 41:5, 1205-1211.
- Gosset William S. 1909. ¡°The Probable Error of a Mean,¡± Biometrika, 6(1):1-25.
- Hildebrand, David K., R. Lyman Ott, and J. Brian Gray. 2005. Basic Statistical Ideas for Managers, 2nd ed. Belmont, CA: Thomson Brooks/Cole.
- SAS Institute. 2005. SAS/STAT User's Guide, Version 9.1. Cary, NC: SAS Institute.
- Satterthwaite, F.W. 1946. "An Approximate Distribution of Estimates of Variance Components," Biometrics Bulletin, 2:110-114.
- SPSS Inc. 2007. SPSS 15.0 Syntax Reference Guide. Chicago, IL: SPSS Inc.
- STATA Press. 2007. STATA Reference Manual Release 10. College Station, TX: STATA Press.
- Walker, Glenn A. 2002. Common Statistical Methods for Clinical Research with SAS Examples, , NC: SAS Institute.
- Welch, B. L. 1947. ¡°The Generalization of Student's Problem When Several Different Population Variances Are Involved,¡± Biometrika, 34: 28-35.
I am grateful to Jeremy Albright, Takuya Noguchi, Kevin Wilhite, and Kaigang Li at the UITS Center for Statistical and Mathematical Computing, Indiana University, who provided valuable comments and suggestions.
- 2003. First draft.
- 2004. Second revision (nonparametric methods excluded).
- 2005. Third revision (data arrangements and conclusion added).
- 2007. Fourth revision (comparing proportions added).
- 2008. Fifth revision (SPSS output added).
- 2009. Sixth revision (R output added).