# One-sample/group/distribution t-test
# by Aaron Albin
# Script began: 1/16/2012
###########################
# WHEN TO USE THIS SCRIPT #
###########################
# Use this script if you can answer 'yes' to all of the following questions:
# A) Is there a specific value of interest you have in mind when you look at these numbers? (Usually this is a number you have calculated or one you know beforehand from somewhere else. Most often this number is 0.)
# B) If you were to repeat your experiment over and over again an infinite number of times (with the intention of collecting the exact same number of data points as you currently have each time) and then calculate the average for every one of those hypothetical datasets, you would be interested in knowing:
# * What the average would end up being most of the time (This is the 'confidence interval'.)
# * The probability that the actual average of your data, or any value further-removed from the value of interest than it, would be the result of nothing more than chance, i.e. random noise. (This is the 'p-value'.)
# -----------
# An example:
# -----------
# Source: http://www.wadsworth.com/psychology_d/special_features/ext/workshops/t_testsample1.html
# You hear that the average person sleeps 8 hours a day.
# You think college students sleep less.
# You ask 10 college kids how long they sleep on an average day.
# You get the data and the mean of sleep time is 6.5 hours.
# Is this luck? Did you happen to pick a group of light sleepers by chance? Or do college kids really sleep less?
###############
# SELECT DATA #
###############
# Run this to have R tell you the names for each of the columns that it ended up with:
#=====[1]=====
colnames(Dataframe)
#=============
# Now, pick out which of these columns has your number data for the t-test and type the name of that column between the double-quotes below.
#=====[2]=====
NumberColumnName = "NumberColumn1"
# NumberColumnName = "VOT1"
#=============
# Now run this to extract out the specific columns data you selected
#=====[3]=====
NumberData = Dataframe[,NumberColumnName]
#=============
###########################
# SET UP AND RUN ANALYSIS #
###########################
# Here, specify the 'true value' of the distribution's mean under the null hypothesis. This will normally be 0.
#=====[4]=====
Mean_NullHypothesis = 0
#=============
# Here, specify the 'directionality' of the alternative hypothesis. You have three choices:
# * "two.sided" (Note the period!) - Your alternative hypothesis is that the hypothetical distribution's mean is NOT EQUAL TO the number specified immediately above for the null hypothesis. This is also called a 'two-tailed' t-test.
# * "greater" - Your alternative hypothesis is that the hypothetical distribution's mean is GREATER THAN the number specified immediately above for the null hypothesis. This is one kind of 'one-tailed' t-test.
# * "less" - Your alternative hypothesis is that the hypothetical distribution's mean is LESS THAN the number specified immediately above for the null hypothesis. This is one kind of 'one-tailed' t-test.
#=====[5]=====
Directionality_AlternativeHypothesis = "two.sided"
#=============
# In the output, it gives you a confidence interval. R defaults to a 95% confidence interval, as this is common practice in the literature, but you can adjust the percentage here if you like:
#=====[6]=====
PercentageForConfidenceInterval = 0.95
#=============
# Now run the actual 't.test()' function.
# Note that R will issue an error message if the input data is unvarying/constant.
#=====[7]=====
Results = t.test( x=NumberData, alternative=Directionality_AlternativeHypothesis, mu=Mean_NullHypothesis, conf.level=PercentageForConfidenceInterval)
#=============
#######################
# INSPECT THE RESULTS #
#######################
# Run the following code for a reformatted, easy-to-read representation of the results:
#=====[8]=====
if(as.character(Results$alternative)=="two.sided"){AlternativeText=paste("The true mean is not equal to ", as.numeric(Results$null.value),".",sep="")}; if(as.character(Results$alternative)=="greater"){AlternativeText=paste("The true mean is greater than ", as.numeric(Results$null.value),".",sep="")}; if(as.character(Results$alternative)=="less"){AlternativeText=paste("The true mean is less than ", as.numeric(Results$null.value),".",sep="")}; cat("\n\nDESCRIPTIVE STATISTICS\n~~~~~~~~~~~~~~~~~~~~~~\nThe actual calculated mean of " ,NumberColumnName," is ", as.numeric(Results$estimate),".\n\nTEST THAT WAS RUN\n~~~~~~~~~~~~~~~~~\nA ", Results$method, " was performed on ", NumberColumnName,".\n - Null hypothesis: The true mean is ", as.numeric(Results$null.value),".\n - Alternative hypothesis: ",AlternativeText,"\n\nRESULTS\n~~~~~~~\nThe ",100*PercentageForConfidenceInterval,"% confidence interval for the mean of ",NumberColumnName," ranges\n from ", as.numeric(Results$conf.int[1]), " to ", as.numeric(Results$conf.int[2]),".\nt(", as.numeric(Results$parameter), ")=", as.numeric(Results$statistic), ", p=",as.numeric(Results$p.value),"\n\n\n",sep="")
#=============
# Here is R's default display of the results:
#=====[9]=====
show(Results)
#=============