2. One Sample T-Test

Suppose we obtain n measurements y1 through yn that were randomly selected from a normally distributed population with unknown parameters. One example is the SAT scores of 100 undergraduate students who were randomly chosen.

The one sample t-test examines whether the unknown population mean differs from a hypothesized value. This is the null hypothesis of the one sample t-test. The t statistic is computed as follows.
T distribution

Here we are testing if the population mean of the death rate from lung cancer is 20 per 100,000 people at the .01 significance level. The null hypothesis of this two-tailed test is that the population mean is 20.

2.1 One Sample T-test in STATA

The .ttest command conducts various forms of t-tests in Stata. For the one sample test, the command requires that a hypothesized value be explicitly specified. The level() option is to specify a confidence level as a percentage; if omitted, 95 percent by default is assumed. Note that the 99 percent confidence level is equivalent to the .01 significance level.

. ttest lung=20, level(99)
One-sample t test
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [99% Conf. Interval]
---------+--------------------------------------------------------------------
    lung |      44    19.65318    .6374133    4.228122    17.93529    21.37108
------------------------------------------------------------------------------
    mean = mean(lung)                                             t =  -0.5441
Ho: mean = 20                                    degrees of freedom =       43

   Ha: mean < 20               Ha: mean != 20                 Ha: mean > 20
 Pr(T < t) = 0.2946         Pr(|T| > |t|) = 0.5892          Pr(T > t) = 0.7054

STATA first lists descriptive statistics of the variable lung. The mean and standard deviation of the 44 observations are 19.653 and 4.228, respectively. The standard error is .6374 = 4.2281 / sqrt(44) and the 99 percent confidence interval of the mean is 19.6532 ± 2.695 * .6374, where the 2.695 is the critical value of the two-tailed test with 43 (=44-1) degrees of freedom at the .01 significance level. Finally, the T statistic is -.544 = (19.653-20) / .6374.

There are three t-tests at the bottom of the output above. The first and third are one-tailed tests, whereas the second is the two-tailed test. The first p-value .2946, for example, is for one-tailed test for the research hypothesis. Since this test is two tailed, you should read the second p-value for the null hypothesis.

The t statistic of -.5441 and its large p-value of .5892 do not reject the null hypothesis that the population mean of the death rate from lung cancer is 20 at the .01 level. The average death rate may be 20 per 100,000 people (p<.5892). Notice that the hypothesized value 20 falls into the 99 percent confidence interval 17.9353-21.3711.

If you just have the aggregated data (i.e., the number of observations, mean, and standard deviation of a sample), use the .ttesti command to replicate the t-test above. The hypothesized value 20 is specified at the end of the summary statistics. You may not omit the level(99) option.

. ttesti 44 19.65318 4.228122 20, level(99)
(output is skipped)

Top

2.2 One Sample T-test Using the SAS TTEST Procedure

The TTEST procedure conducts various types of t-tests in SAS. The H0 option specifies a hypothesized value, and ALPHA indicates a significance level. If omitted, zero and the .05 level by default are assumed. Let us first declare a library using the LIBNAME statement. A SAS data set is read though a SAS library, which is an alias of the collection of SAS data sets.

LIBNAME masil 'c:\data\sas';

The LIBNAME statement defines a library masil directing to c:\data\sas. DATA=masil.smoking in the following TTEST statement tells SAS to read the data set smoking in the masil library.

PROC TTEST H0=20 ALPHA=.01 DATA=masil.smoking;
   VAR lung;
RUN;
                                   The TTEST Procedure
 
                                       Statistics
 
                    Lower CL          Upper CL  Lower CL           Upper CL
   Variable      N      Mean    Mean      Mean   Std Dev  Std Dev   Std Dev  Std Err
 
   lung         44    17.935  19.653    21.371    3.2994   4.2281    5.7989   0.6374
 
 
                                         T-Tests
 
                         Variable      DF    t Value    Pr > |t|
 
                         lung          43      -0.54      0.5892

The TTEST procedure reports descriptive statistics followed by the two-tailed t-test. The small t statistic does not reject the null hypothesis of the population mean 20 at the .01 level (p<.5892). If you have a summary data set containing the values of a variable lung and their frequencies, say count, use the FREQ statement.

PROC TTEST H0=20 ALPHA=.01 DATA=masil.smoking;
   VAR lung;
   FREQ count;
RUN;
(output is skipped)

Top

2.3 One Sample T-test Using the SAS UNIVARIATE and MEANS Procedures

The SAS UNIVARIATE and MEANS procedures also perform the one sample t-test. The UNIVARIATE procedure is basically designed to produces a variety of descriptive statistics of a variable. Its MU0 option tells the procedure to perform a t-test using the hypothesized value specified. VARDEF=DF specifies a divisor (degrees of freedom) used in computing the variance (standard deviation). The NORMAL option examines if the variable is normally distributed.

PROC UNIVARIATE MU0=20 VARDEF=DF NORMAL ALPHA=.01 DATA=masil.smoking;
   VAR lung;
RUN;
                                    The UNIVARIATE Procedure
                                        Variable:  lung
 
                                            Moments
 
                N                          44    Sum Weights                 44
                Mean               19.6531818    Sum Observations        864.74
                Std Deviation      4.22812167    Variance            17.8770129
                Skewness            -0.104796    Kurtosis             -0.949602
                Uncorrected SS      17763.604    Corrected SS        768.711555
                Coeff Variation    21.5136751    Std Error Mean      0.63741333
 
 
                                   Basic Statistical Measures
 
                         Location                    Variability
 
                     Mean     19.65318     Std Deviation            4.22812
                     Median   20.32000     Variance                17.87701
                     Mode       .          Range                   15.26000
                                           Interquartile Range      6.53000
 
 
                                   Tests for Location: Mu0=20
 
                        Test           -Statistic-    -----p Value------
 
                        Student's t    t   -0.5441    Pr > |t|    0.5892
                        Sign           M         1    Pr >= |M|   0.8804
                        Signed Rank    S     -36.5    Pr >= |S|   0.6752
 
 
                                      Tests for Normality
 
                   Test                  --Statistic---    -----p Value------
 
                   Shapiro-Wilk          W     0.967845    Pr < W      0.2535
                   Kolmogorov-Smirnov    D     0.086184    Pr > D     >0.1500
                   Cramer-von Mises      W-Sq  0.063737    Pr > W-Sq  >0.2500
                   Anderson-Darling      A-Sq  0.382105    Pr > A-Sq  >0.2500
 
                                    Quantiles (Definition 5)
 
                                     Quantile      Estimate
 
                                     100% Max        27.270
                                     99%             27.270
                                     95%             25.950
                                     90%             25.450
                                     75% Q3          22.815
                                     50% Median      20.320
                                     25% Q1          16.285
 
 
                                    Quantiles (Definition 5)
 
                                     Quantile      Estimate
 
                                     10%             14.110
                                     5%              12.120
                                     1%              12.010
                                     0% Min          12.010
 
 
                                      Extreme Observations
 
                             -----Lowest----        ----Highest----
 
                              Value      Obs         Value      Obs
 
                              12.01       39         25.45       16
                              12.11       33         25.88        1
                              12.12       30         25.95       27
                              13.58       10         26.48       18
                              14.11       36         27.27        8

In the beginning of the output above, you may see the summary statistics such as N (44), mean(19.6532), variance (17.8770), skewness (-.1048), and kurtosis (-.9496). The third block of the output entitled as “Tests for Location: Mu0=20” reports the t statistic and its p-value. The fourth block entitled as “Tests for Normality” contains several statistics of the normality test. Since N is less than 2,000, you should read the Shapiro-Wilk W, which suggests that lung is normally distributed (p<.2535).

The MEANS procedure performs the one sample t-test using the T and PROBT options, which respectively request the two-tailed t statistic and its p-value. The CLM option produces the confidence interval (or upper and lower limits). The MEAN, STD, and STDERR respectively print the sample mean, standard deviation, and standard error.

PROC MEANS MEAN STD STDERR T PROBT CLM VARDEF=DF ALPHA=.01 DATA=masil.smoking;
   VAR lung;
RUN;
                                   The MEANS Procedure
 
                                Analysis Variable : lung
 
                                                                 Lower 99%     Upper 99%
         Mean       Std Dev     Std Error  t Value  Pr > |t|   CL for Mean   CL for Mean
 ---------------------------------------------------------------------------------------
   19.6531818     4.2281217     0.6374133    30.83    <.0001    17.9352878    21.3710758
 ---------------------------------------------------------------------------------------

The MEANS procedure does not, however, have an option to specify a hypothesized value other than zero. Thus, the null hypothesis here is that the population mean of death rate from lung cancer is zero. The t statistic is 30.83 = (19.6532-0) / .6374 and the corresponding p-value is less than .0001. The large t statistic and small p-value reject the null hypothesis at the .01 significance level. The average death rate from lung cancer is not zero but much larger than zero. The confidence interval remains unchanged as long as the same significance level is used.

Top

2.4 One Sample T-test in SPSS

SPSS has the T-TEST command for t-tests. First, open the SPSS syntax editor by clicking File--> New--> Syntax consecutively. The /TESTVAL subcommand specifies the value with which the sample mean is compared, whereas /VARIABLES lists the variables to be tested. Like Stata, SPSS specifies a confidence level rather than a significance level in the /CRITERIA=CI() subcommand.

T-TEST
   /TESTVAL = 20
   /VARIABLES = lung
   /CRITERIA = CI(.99) .

Alternatively, you may click Analyze--> Compare Means--> One-Sample T Test and then provide variables to be compared. All SPSS output is skipped in this document.



Up: Table of Contents
Next: Paired T-test
Prev: Introduction