Stat/Math
Software Support
Software Consulting
Software Availability
Software Price
Contact

User Support
Documentation
Knowledge Base
Education
Consulting
Podcasts

Systems & Services
Cyberinfrastructure
Supercomputers
Grid Computing
Storage
Visualization
Digital Libraries & Data

Results & Impact
Publications
Grants & Grant Info
Events & Outreach
Economic Impact
Survey Results

Vision & Planning
News & Features

# 2. One Sample T-Test

Suppose we obtain n measurements y1 through yn that were randomly selected from a normally distributed population with unknown parameters. One example is the SAT scores of 100 undergraduate students who were randomly chosen.

The one sample t-test examines whether the unknown population mean differs from a hypothesized value. This is the null hypothesis of the one sample t-test. The t statistic is computed as follows.

Here we are testing if the population mean of the death rate from lung cancer is 20 per 100,000 people at the .01 significance level. The null hypothesis of this two-tailed test is that the population mean is 20.

2.1 One Sample T-test in STATA

The .ttest command conducts various forms of t-tests in Stata. For the one sample test, the command requires that a hypothesized value be explicitly specified. The level() option is to specify a confidence level as a percentage; if omitted, 95 percent by default is assumed. Note that the 99 percent confidence level is equivalent to the .01 significance level.

. ttest lung=20, level(99)
One-sample t test
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [99% Conf. Interval]
---------+--------------------------------------------------------------------
lung |      44    19.65318    .6374133    4.228122    17.93529    21.37108
------------------------------------------------------------------------------
mean = mean(lung)                                             t =  -0.5441
Ho: mean = 20                                    degrees of freedom =       43

Ha: mean < 20               Ha: mean != 20                 Ha: mean > 20
Pr(T < t) = 0.2946         Pr(|T| > |t|) = 0.5892          Pr(T > t) = 0.7054

STATA first lists descriptive statistics of the variable lung. The mean and standard deviation of the 44 observations are 19.653 and 4.228, respectively. The standard error is .6374 = 4.2281 / sqrt(44) and the 99 percent confidence interval of the mean is 19.6532 ± 2.695 * .6374, where the 2.695 is the critical value of the two-tailed test with 43 (=44-1) degrees of freedom at the .01 significance level. Finally, the T statistic is -.544 = (19.653-20) / .6374.

There are three t-tests at the bottom of the output above. The first and third are one-tailed tests, whereas the second is the two-tailed test. The first p-value .2946, for example, is for one-tailed test for the research hypothesis. Since this test is two tailed, you should read the second p-value for the null hypothesis.

The t statistic of -.5441 and its large p-value of .5892 do not reject the null hypothesis that the population mean of the death rate from lung cancer is 20 at the .01 level. The average death rate may be 20 per 100,000 people (p<.5892). Notice that the hypothesized value 20 falls into the 99 percent confidence interval 17.9353-21.3711.

If you just have the aggregated data (i.e., the number of observations, mean, and standard deviation of a sample), use the .ttesti command to replicate the t-test above. The hypothesized value 20 is specified at the end of the summary statistics. You may not omit the level(99) option.

. ttesti 44 19.65318 4.228122 20, level(99)
(output is skipped)

Top

2.2 One Sample T-test Using the SAS TTEST Procedure

The TTEST procedure conducts various types of t-tests in SAS. The H0 option specifies a hypothesized value, and ALPHA indicates a significance level. If omitted, zero and the .05 level by default are assumed. Let us first declare a library using the LIBNAME statement. A SAS data set is read though a SAS library, which is an alias of the collection of SAS data sets.

LIBNAME masil 'c:\data\sas';

The LIBNAME statement defines a library masil directing to c:\data\sas. DATA=masil.smoking in the following TTEST statement tells SAS to read the data set smoking in the masil library.

PROC TTEST H0=20 ALPHA=.01 DATA=masil.smoking;
VAR lung;
RUN;
The TTEST Procedure

Statistics

Lower CL          Upper CL  Lower CL           Upper CL
Variable      N      Mean    Mean      Mean   Std Dev  Std Dev   Std Dev  Std Err

lung         44    17.935  19.653    21.371    3.2994   4.2281    5.7989   0.6374

T-Tests

Variable      DF    t Value    Pr > |t|

lung          43      -0.54      0.5892

The TTEST procedure reports descriptive statistics followed by the two-tailed t-test. The small t statistic does not reject the null hypothesis of the population mean 20 at the .01 level (p<.5892). If you have a summary data set containing the values of a variable lung and their frequencies, say count, use the FREQ statement.

PROC TTEST H0=20 ALPHA=.01 DATA=masil.smoking;
VAR lung;
FREQ count;
RUN;
(output is skipped)

Top

2.3 One Sample T-test Using the SAS UNIVARIATE and MEANS Procedures

The SAS UNIVARIATE and MEANS procedures also perform the one sample t-test. The UNIVARIATE procedure is basically designed to produces a variety of descriptive statistics of a variable. Its MU0 option tells the procedure to perform a t-test using the hypothesized value specified. VARDEF=DF specifies a divisor (degrees of freedom) used in computing the variance (standard deviation). The NORMAL option examines if the variable is normally distributed.

PROC UNIVARIATE MU0=20 VARDEF=DF NORMAL ALPHA=.01 DATA=masil.smoking;
VAR lung;
RUN;
The UNIVARIATE Procedure
Variable:  lung

Moments

N                          44    Sum Weights                 44
Mean               19.6531818    Sum Observations        864.74
Std Deviation      4.22812167    Variance            17.8770129
Skewness            -0.104796    Kurtosis             -0.949602
Uncorrected SS      17763.604    Corrected SS        768.711555
Coeff Variation    21.5136751    Std Error Mean      0.63741333

Basic Statistical Measures

Location                    Variability

Mean     19.65318     Std Deviation            4.22812
Median   20.32000     Variance                17.87701
Mode       .          Range                   15.26000
Interquartile Range      6.53000

Tests for Location: Mu0=20

Test           -Statistic-    -----p Value------

Student's t    t   -0.5441    Pr > |t|    0.5892
Sign           M         1    Pr >= |M|   0.8804
Signed Rank    S     -36.5    Pr >= |S|   0.6752

Tests for Normality

Test                  --Statistic---    -----p Value------

Shapiro-Wilk          W     0.967845    Pr < W      0.2535
Kolmogorov-Smirnov    D     0.086184    Pr > D     >0.1500
Cramer-von Mises      W-Sq  0.063737    Pr > W-Sq  >0.2500
Anderson-Darling      A-Sq  0.382105    Pr > A-Sq  >0.2500

Quantiles (Definition 5)

Quantile      Estimate

100% Max        27.270
99%             27.270
95%             25.950
90%             25.450
75% Q3          22.815
50% Median      20.320
25% Q1          16.285

Quantiles (Definition 5)

Quantile      Estimate

10%             14.110
5%              12.120
1%              12.010
0% Min          12.010

Extreme Observations

-----Lowest----        ----Highest----

Value      Obs         Value      Obs

12.01       39         25.45       16
12.11       33         25.88        1
12.12       30         25.95       27
13.58       10         26.48       18
14.11       36         27.27        8

In the beginning of the output above, you may see the summary statistics such as N (44), mean(19.6532), variance (17.8770), skewness (-.1048), and kurtosis (-.9496). The third block of the output entitled as “Tests for Location: Mu0=20” reports the t statistic and its p-value. The fourth block entitled as “Tests for Normality” contains several statistics of the normality test. Since N is less than 2,000, you should read the Shapiro-Wilk W, which suggests that lung is normally distributed (p<.2535).

The MEANS procedure performs the one sample t-test using the T and PROBT options, which respectively request the two-tailed t statistic and its p-value. The CLM option produces the confidence interval (or upper and lower limits). The MEAN, STD, and STDERR respectively print the sample mean, standard deviation, and standard error.

PROC MEANS MEAN STD STDERR T PROBT CLM VARDEF=DF ALPHA=.01 DATA=masil.smoking;
VAR lung;
RUN;
The MEANS Procedure

Analysis Variable : lung

Lower 99%     Upper 99%
Mean       Std Dev     Std Error  t Value  Pr > |t|   CL for Mean   CL for Mean
---------------------------------------------------------------------------------------
19.6531818     4.2281217     0.6374133    30.83    <.0001    17.9352878    21.3710758
---------------------------------------------------------------------------------------

The MEANS procedure does not, however, have an option to specify a hypothesized value other than zero. Thus, the null hypothesis here is that the population mean of death rate from lung cancer is zero. The t statistic is 30.83 = (19.6532-0) / .6374 and the corresponding p-value is less than .0001. The large t statistic and small p-value reject the null hypothesis at the .01 significance level. The average death rate from lung cancer is not zero but much larger than zero. The confidence interval remains unchanged as long as the same significance level is used.

Top

2.4 One Sample T-test in SPSS

SPSS has the T-TEST command for t-tests. First, open the SPSS syntax editor by clicking File--> New--> Syntax consecutively. The /TESTVAL subcommand specifies the value with which the sample mean is compared, whereas /VARIABLES lists the variables to be tested. Like Stata, SPSS specifies a confidence level rather than a significance level in the /CRITERIA=CI() subcommand.

T-TEST
/TESTVAL = 20
/VARIABLES = lung
/CRITERIA = CI(.99) .

Alternatively, you may click Analyze--> Compare Means--> One-Sample T Test and then provide variables to be compared. All SPSS output is skipped in this document.