To illustrate the procedure of a two-sample t-test in R, suppose you are interested in whether or not former colonies and countries without a colonial past differ in the quality of their institutions. The data allows for such a comparison with the Colony and Institutions variables. Implementing a t-test in R is possible with the t.test() function. The t.test() function requires specification of the variable you are comparing (in this case, Institutions) and the group variable (Colony). The alternative statement specifies the type of test: two.sided, less, or greater are the available options. The var.equal argument can be set to treat the variances as equal; if TRUE, the pooled variance estimates are used, and if FALSE, the Welch approximation to the degrees of freedom is used.
> t.test(Institutions~Colony, alternative=two.sided, var.equal=TRUE, conf.level=0.95) Two Sample t-test data: Institutions by Colony t = 3.7596, df = 73, p-value = 0.0003405 alternative hypothesis: difference in means not equal to 0 95 percent confidence interval: 0.914618 2.978317 mean in group 0 mean in group 1 6.889169 4.942702
The p-value is less than α = .05, thus we reject the null hypothesis that there is no difference in quality of institutions between colonies and non-colonies. Colonies and countries without a colonial past have different mean quality of institutions.
One sample t-tests are easily implemented in R by omitting the group variable in the t.test() function. Paired t-tests are also possible with the t.test() function by setting the argument paired=TRUE.
Simple linear regression
Say you want to implement the following model in R:
>Invest = ß0+ ß1(Institutions) + ß2(Open.Market)
The syntax for the model above is the following:
> results1 <- lm(Invest~Institutions+Open.Market, data=mydata)
The function lm() is used to fit linear models in R. The function requests specification of the model’s formula as the first argument, which is Invest ~ Institutions+Open.Market (the intercept is included by default). In addition, you need to specify in which data frame the function should look for the variables/data by using the data= argument. . Storing results in an object (results1 in this example) will allow you to invoke several functions for useful information about the model results. The function summary() provides the model’s residuals, estimates for the model’s coefficients, standard errors, t statistics and p-values, as well as model statistics (F-statistic, R-squared, etc.).
> summary(results1) Call: lm(formula = Invest ~ Institutions + Open.Market, data = mydata) Residuals: Min 1Q Median 3Q Max -12.67273 -3.26294 0.03802 3.21995 13.48049 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 7.833 1.790 4.377 4e-05 *** Institutions 1.335 0.370 3.607 0.000567 *** Open.Market 6.487 1.986 3.266 0.001672 ** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 5.138 on 72 degrees of freedom Multiple R-squared: 0.5616, Adjusted R-squared: 0.5495 F-statistic: 46.13 on 2 and 72 DF, p-value: 1.276e-13
Other useful functions are coef(), resid() and fitted(), which return the model’s coefficients, residual errors on the dependent variable, and the predicted values of the dependent variable, respectively.
In addition to fitting a first-order model in Institutions and Open.Market, you may also want to include an interaction term between Institutions and Open.Market, include a polynomial term (e.g., Institutions2) or exclude the intercept from the model. Each of these models is described in Table 1.
Invest = ß0 + ß1(Institutions) + ß2(Open.Market) + ß3(Institutions*Open.Market)
results2 <- lm(Invest ~ Institutions + Open.Market + Institutions:Open.Market, data=mydata)
Polynomial Term:Invest= ß0 + ß1(Institutions) + ß2(Open.Market) + ß3(Institutions2)
results3 <- lm(Invest ~ Institutions + Open.Market + I(Institutions^2), data=mydata)
Exclude Intercept:Invest = ß1(Institutions) + ß2(Open.Market)
results4 <- lm(Invest ~ -1 + Institutions + Open.Market, data=mydata)
Finally, you should evaluate the results of your model by examining the residual errors and scanning the data for significant outliers. One way to do so is by using the plot() function:
> layout(matrix(1:4, 2, 2)) > plot(results1)
The layout() function formats the window for the subsequent plot function. In this case, we create a 2x2 window with four slots. Next, the plot() function generates four plots for the first model we fit, whose results were stored in the object results1. The top-left slot graphs the residual errors against the fitted values; the bottom-left slot is a Q-Q plot; the top-right slot graphs the square root of the standardized residuals against the fitted values; and the bottom-right slot graphs the leverage of each observation, with Cook’s distance superimposed on the plot. R produces the axes and plot labels automatically.
Analysis of variance (ANOVA) is easily implemented in R. Returning to the example from Section 5.1 on implementing a t-test, suppose you want to compare the mean values of quality of institutions for former colonies and countries without a colonial history through one-way analysis of variance. You might begin your analysis with a boxplot in order to compare the distributions of quality of institutions for colonies and non-colonies.
> boxplot(Institutions~Colony, xlab="Colony", ylab="Quality of Institutions", main="Quality of Institutions by Colony", col="gray")
For a formal ANOVA test, use the aov() function, specifying the model's formula and the dataset to be used, as in the lm() function. Use the summary() function to view the results.
> results5 <- aov(Institutions~Colony, data=mydata) > summary(results5) Df Sum Sq Mean Sq F value Pr(>F) Colony 1 70.02 70.016 14.134 0.0003405 *** Residuals 73 361.61 4.954 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Finally, you may wish to examine the results visually. Use the layout() and plot() functions, as described in Section 5.2.
Next: R Output
Prev: Descriptive Data Analysis
Up: Table of Contents