Stat/Math
Software Support
Software Consulting
Software Availability
Software Price
Contact

User Support
Documentation
Knowledge Base
Education
Consulting
Podcasts

Systems & Services
Cyberinfrastructure
Supercomputers
Grid Computing
Storage
Visualization
Digital Libraries & Data

Results & Impact
Publications
Grants & Grant Info
Events & Outreach
Economic Impact
Survey Results

Vision & Planning
News & Features

Data Analysis

Two-Sample t-test

To illustrate the procedure of a two-sample t-test in R, suppose you are interested in whether or not former colonies and countries without a colonial past differ in the quality of their institutions. The data allows for such a comparison with the Colony and Institutions variables. Implementing a t-test in R is possible with the t.test() function. The t.test() function requires specification of the variable you are comparing (in this case, Institutions) and the group variable (Colony). The alternative statement specifies the type of test: two.sided, less, or greater are the available options. The var.equal argument can be set to treat the variances as equal; if TRUE, the pooled variance estimates are used, and if FALSE, the Welch approximation to the degrees of freedom is used.

```
> t.test(Institutions~Colony, alternative=two.sided,

var.equal=TRUE, conf.level=0.95)

Two Sample t-test

data: Institutions by Colony

t = 3.7596, df = 73, p-value = 0.0003405

alternative hypothesis: difference in means not equal to 0

95 percent confidence interval:

0.914618 2.978317

mean in group 0 mean in group 1

6.889169        4.942702

```

The p-value is less than α = .05, thus we reject the null hypothesis that there is no difference in quality of institutions between colonies and non-colonies. Colonies and countries without a colonial past have different mean quality of institutions.

One sample t-tests are easily implemented in R by omitting the group variable in the t.test() function. Paired t-tests are also possible with the t.test() function by setting the argument paired=TRUE.

Simple linear regression

Say you want to implement the following model in R:

```

>Invest  = ß0+ ß1(Institutions) + ß2(Open.Market) ```

The syntax for the model above is the following:

```
> results1 <- lm(Invest~Institutions+Open.Market, data=mydata)

```

The function lm() is used to fit linear models in R. The function requests specification of the model’s formula as the first argument, which is Invest ~ Institutions+Open.Market (the intercept is included by default). In addition, you need to specify in which data frame the function should look for the variables/data by using the data= argument.  . Storing results in an object (results1 in this example) will allow you to invoke several functions for useful information about the model results. The function summary() provides the model’s residuals, estimates for the model’s coefficients, standard errors, t statistics and p-values, as well as model statistics (F-statistic, R-squared, etc.).

```
> summary(results1)

Call: lm(formula = Invest ~ Institutions + Open.Market, data = mydata)

Residuals: Min 1Q Median 3Q Max -12.67273 -3.26294 0.03802 3.21995

13.48049

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept)     7.833      1.790   4.377    4e-05 ***

Institutions    1.335      0.370   3.607 0.000567 ***

Open.Market     6.487      1.986   3.266 0.001672 **

---

Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

Residual standard error: 5.138 on 72 degrees of freedom

Multiple R-squared: 0.5616,     Adjusted R-squared: 0.5495

F-statistic: 46.13 on 2 and 72 DF,  p-value: 1.276e-13

```

Other useful functions are coef(), resid() and fitted(), which return the model’s coefficients, residual errors on the dependent variable, and the predicted values of the dependent variable, respectively.

In addition to fitting a first-order model in Institutions and Open.Market, you may also want to include an interaction term between Institutions and Open.Market, include a polynomial term (e.g., Institutions2) or exclude the intercept from the model. Each of these models is described in Table 1.

 Model R Code Interaction Term: Invest = ß0 + ß1(Institutions) + ß2(Open.Market) + ß3(Institutions*Open.Market) ```results2 <- lm(Invest ~ Institutions + Open.Market + Institutions:Open.Market, data=mydata)``` Polynomial Term: Invest= ß0 + ß1(Institutions) + ß2(Open.Market) + ß3(Institutions2) ```results3 <- lm(Invest ~ Institutions + Open.Market + I(Institutions^2), data=mydata)``` Exclude Intercept: Invest = ß1(Institutions) + ß2(Open.Market) ```results4 <- lm(Invest ~ -1 + Institutions + Open.Market, data=mydata) ```

Finally, you should evaluate the results of your model by examining the residual errors and scanning the data for significant outliers. One way to do so is by using the plot() function:

```
> layout(matrix(1:4, 2, 2))

> plot(results1)

```

The layout() function formats the window for the subsequent plot function. In this case, we create a 2x2 window with four slots. Next, the plot() function generates four plots for the first model we fit, whose results were stored in the object results1. The top-left slot graphs the residual errors against the fitted values; the bottom-left slot is a Q-Q plot; the top-right slot graphs the square root of the standardized residuals against the fitted values; and the bottom-right slot graphs the leverage of each observation, with Cook’s distance superimposed on the plot. R produces the axes and plot labels automatically.

One-way ANOVA

Analysis of variance (ANOVA) is easily implemented in R. Returning to the example from Section 5.1 on implementing a t-test, suppose you want to compare the mean values of quality of institutions for former colonies and countries without a colonial history through one-way analysis of variance. You might begin your analysis with a boxplot in order to compare the distributions of quality of institutions for colonies and non-colonies.

```
> boxplot(Institutions~Colony, xlab="Colony", ylab="Quality

of Institutions", main="Quality of Institutions by Colony", col="gray")

```

For a formal ANOVA test, use the aov() function, specifying the model's formula and the dataset to be used, as in the lm() function.   Use the summary() function to view the results.

```
> results5 <- aov(Institutions~Colony, data=mydata)

> summary(results5)

Df Sum Sq Mean Sq F value    Pr(>F)

Colony       1  70.02  70.016  14.134 0.0003405 ***

Residuals   73 361.61   4.954

---

Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

```

Finally, you may wish to examine the results visually. Use the layout() and plot() functions, as described in Section 5.2.

Next: R Output
Prev: Descriptive Data Analysis