The t-test and analysis of variance (ANOVA) are widely used statistical methods to compare group means. For example, the independent sample t-test enables you to compare annual personal income between rural and urban areas and examine the difference in the grade point average (GPA) between male and female students. Using the paired t-test, you can also compare the change in outcomes before and after a treatment is applied.
In a t-test, the mean of a variable to be compared should be substantively interpretable. Technically, the left-hand side (LHS) variable to be tested should be interval or ratio scaled (continuous), whereas the right-hand side (RHS) variable should be binary (categorical). The t-test can also compare the proportions of binary variables. The mean of a binary variable is the proportion or percentage of success of the variable. When a sample size is large, the t-test and z-test for comparing proportions produce almost the same answer.
T-tests assume random sampling and population normality. When two samples have the same population variance, the independent samples t-test uses the pooled variance. Otherwise, individual variances need to be used in the denominator and approximation of the degrees of freedom.
T-tests assume that samples are randomly drawn from normally distributed populations with unknown population variances. The variables of interest should be random variables, whose values change randomly. A constant such as the number of parents of a person is not a random variable. In addition, the occurrence of one measurement in a variable should be independent of the occurrence of others. In other word, the occurrence of an event does not change the probability that other events occur. This property is called statistical independence. Time series data are likely to be statistically dependent because they are often autocorrelated.
T-tests assume random sampling without any selection bias. If a researcher intentionally selects some samples with properties that he prefers and then compares them with other samples, his inferences based on this non-random sampling are neither reliable nor generalized. In an experiment, a subject should be randomly assigned to either the control or treated group so that two groups do not have any systematic difference except for the treatment applied. When subjects can decide to participate or not (non-random assignment), however, the independent sample t-test may under- or over-estimate the difference between the control and treated groups. In this case of self-selection, the propensity score matching and treatment effect model may produce robust and reliable estimates of the mean differences.
Another key assumption is population normality. If this assumption is violated, a sample mean is no longer the best measure (unbiased estimator) of central tendency and t-tests will not be valid. Figure 1 illustrates the standard normal probability distribution on the left and a bimodal distribution on the right. Even if the two distributions have the same mean and variance, we cannot say much about their mean difference.
Figure 1. Comparing the Standard Normal and a Bimodal Probability Distributions
The violation of normality becomes more problematic in the one-tailed test than the two-tailed one (Hildebrand et al. 2005: 329). Figure 2 shows how the violation influences statistical inferences. The red curve indicates the standard normal probability distribution with its 1 percent one-tailed rejection areas on the left. The blue one is for a non-normal distribution with the blue 1 percent rejection area. The test statistic indicated by a vertical green line falls in the rejection area of the skewed non-normal distribution but does not in the red shaded area of the standard normal distribution. If the populations follow such a non-normal distribution, the one-tailed t-test based on the normality does not mistakenly reject the null hypothesis.
Figure 2. Inferential Fallacy When the Normality Assumption Is Violated
Due to the Central Limit Theorem, the normality assumption is not as problematic as imagined in the real world. The Theorem says that the distribution of a sample mean (e.g., and ) is approximately normal when its sample size is sufficiently large. When n1 + n2 >=30, in practice, you do not need to worry too much about the normality assumption.
When a sample size is small and normality is questionable, you might draw a histogram, P-P plot, and Q-Q plots or conduct the Shapiro-Wilk W (N<=2000), Shapiro-Francia W (N<=5000), Kolmogorov-Smirnov D (N>2000), and Jarque-Bera tests. If the normality assumption is violated, you might try such nonparametric methods as the Kolmogorov-Smirnov test, Kruscal-Wallis test, or Wilcoxon Rank-Sum Test, depending on the circumstances.
T-tests can be conducted on a one sample, paired samples, and independent samples. The one sample t-test checks if the population mean is different from a hypothesized value (oftentimes zero). If two samples are taken from different populations and their elements are not paired, the independent sample t-test compares the means of two samples (e.g., GPA of male and female students). In paired samples, individual differences of matched pairs (e.g., pre and post measurements) are examined.
While the independent sample t-test is limited to comparing the means of two groups, the one-way ANOVA (Analysis of Variance) can compare more than two groups. Therefore, the t-test is considered a special case of the one-way ANOVA. These analyses do not, however, necessarily imply any causal relationship between the left-hand and right-hand side variables. The F statistic of ANOVA is t squared (t2) when the degrees of freedom is only one. Whether data are balanced or not does not matter in the t-test and one-way ANOVA. Table 1 compares the independent sample t-test and one-way ANOVA.
Table 1. Comparison between the Independent Sample T-test and One-way ANOVA
|Independent Sample T-test||One-way ANOVA|
|LHS (Dependent)||Interval or ratio variable||Interval or ratio variable|
|RHS (Independent)||Binary variable||Categorical variable|
|Null Hypothesis||mu1=mu2||mu1=mu2=mu3 ...|
|Probability Distribution||T Distribution||F distribution|
Stata has the .ttest (or .ttesti) command to conduct t-tests. The .anova and .oneway commands perform the one-way ANOVA. Stata also has the .prtest (or .prtesti) command to compare the proportions of binary variables. The .ttesti and .prtesti commands are useful when only aggregated information (i.e., the number of observations, means or proportions, and standard deviations) is available.
In SAS, the TTEST procedure conducts various t-tests and the UNIVARIATE and MEANS procedures have options for the one sample t-test. SAS also has the ANOVA, GLM, and MIXED procedures for ANOVA. The ANOVA procedure can handle balanced data only, while the GLM and MIXED can analyze both balanced and unbalanced data. However, unbalanced data having different numbers of observations across groups does not cause any problem in t-tests and the one-way ANOVA.
SPSS has T-TEST, ONEWAY, GLM (or UNIANOVA), and MIXED commands for t-tests and one-way ANOVA. Table 2 summarizes related Stata commands, SAS procedures, and SPSS commands.
Table 2. Related Procedures and Commands in Stata, SAS, and SPSS
|STATA 10 SE||SAS 9.1||SPSS 15|
|Normality Test||.swilk;. sfrancia||UNIVARIATE||EXAMINE|
|Equal Variance Test||.oneway||TTEST||T-TEST|
|Nonparametric Method||.ksmirnov; .kwallis||NPAR1WAY||NPARTESTS|
|Comparing Means (T-test)||.ttest; .ttesti||TTEST; MEANS; ANOVA||T-TEST|
|GLM*||GLM; MIXED||GLM; MIXED|
|Comparing Proportions||.prtest; prtesti||(point-and-click)|
Figure 3 contrasts two types of data arrangement for t-tests. The first data arrangement has a variable to be tested and a grouping variable to classify groups (0 or 1). The second, appropriate especially for paired samples, has two variables to be tested. The two variables in this type are not, however, necessarily paired.
SAS and SPSS require the first data arrangement for the independent sample t-test, whereas Stata can handle both types flexibly. The second arrangement is required for the paired sample t-test in these software packages. Notice that the numbers of observations across groups are not necessarily equal (balanced).
Figure 3. Two Types of Data Arrangement
|Data Arrangement I||Data Arrangement II|
The data set used here is adopted from J. F. Fraumeniís study on cigarette smoking and cancer (Fraumeni 1968). The data are per capita numbers of cigarettes sold by 43 states and the District of Columbia in 1960 together with death rates per hundred thousand people from various forms of cancers. Two variables were added to categorize states into two groups. See the appendix for the details.