Chapters 12-13 Using EXCEL in Regression


From William E. Becker, Statistics for Business and Economics Using Microsoft Excel 97, S.R.B. Publishing, 1997, p. 378-379, 417-419. Reproduced with permission of S.R.B. Publishing.

 

QUERY: In a letter to the editor of The New York Times (November 29, 1986, p. 26), David Rivkin provided data on two variables: highway deaths per 100 million vehicle miles and highway speed limits for 10 countries, Table 12.1. He states:

 From this we can see that five countries with the same speed limit have very different positions on the safety list. Britain ... with a speed limit of 70 is demonstrably safer than Japan, at 55. Clearly, speed has little to do with safety.

 Although Rivkin argues that there is no relationship between speed and death, is he correct? Use regression analysis to answer this question.

 

ANSWER: Part of the EXCEL regression menus and resulting output are shown in Table 12.6. After the ten paired y and x data are typed into cells A2 to A11 and B2 to B11, with the "death" rate label in A1 and "speed" limits label in B1, the following steps produce the regression output.

1. Choose "Regression" from "Data Analysis" in the "Tools" menu. The Regression dialog box shown in Table 12.6 will appear.

Note: Use the mouse to move between the boxes and buttons. Click on the desired box or button. The large rectangular boxes require a range from the worksheet. A range may be typed in or selected by highlighting the cells with the mouse after clicking on the box. If the dialog box blocks the data, it can be moved on the screen by clicking on the title bar and dragging.

2. For the "Input Y Range," enter A1 to A11, and for the "Input X Range" enter B1 to B11.

3. Because the Y and X ranges include the "Death" and "Speed" labels in A1 and B1, select the "Labels" box with a click.

4. Click the "Output Range" button and type reference cell, which in this demonstration is A13.

5. To get the predicted values of Y (Death rates) and residuals select the "Residuals" box with a click.

Your screen display should now look like that in Table 12.6. Clicking "OK" will give the "SUMMARY OUTPUT," "ANOVA" AND RESIDUAL OUTPUT" also shown in Table 12.6.

The first section of the EXCEL printout gives "SUMMARY OUTPUT." The "Multiple R" is the square root of the "R Square;" the computation and interpretation of which we have already discussed. The "Standard Error" of estimate (which will be discussed in the next chapter) is s = 0.86423, which is the square root of "Residual SS" = 5.97511 divided by its degrees of freedom, df = 8, as given in the "ANOVA" section. We will also discuss the adjusted R-square of 0.21325 in the following chapters.

Under the "ANOVA" section are the estimated regression coefficients and related statistics that will be discussed in detail in the next chapter. For now it is sufficient to recognize that the calculated coefficient values for the slope and y intercept are provided (b = 0.07556 and a = -0.29333). Next to these coefficient estimates is information on the variability in the distribution of the least-squares estimators from which these specific estimates were drawn: the column titled "Std. Error" contains the standard deviations (standard errors) of the intercept and slope distributions; the "t-ratio" and "p" columns give the calculated values of the t statistics and associated p-values. As shown in Chapter 13, the t statistic of 1.85458 and p-value of 0.10077, for example, indicates that the sample slope (0.07556) is sufficiently different from zero, at even the 0.10 two-tail Type I error level, to conclude that there is a significant relationship between deaths and speed limits in the population. This conclusion is contrary to Rivkin's assertion that "speed has little to do with safety."

 

TABLE 12.6 Computer Printout for Regression of Death on Speed Limit

 

SUMMARY OUTPUT Regression Statistics

Multiple R 0.54833
R Square 0.30067
Adjusted R Square 0.21325
Standard Error 0.86423
Observations 10

ANOVA df SS MS F Significance F
Regression 1 2.56889 2.56889 3.43945 0.10077
Residual 8 5.97511 0.74689
Total 9 8.54400

Coefficients Std. Error T Stat P-value Lower 95% Upper 95%
Intercept -0.29333 2.45963 -0.11926 0.90801 -5.96526 5.37860
Speed 0.07556 0.04074 1.85458 0.10077 -0.01839 0.16950

Residual Output Observation Predicted Death Residuals
1 3.86222 -0.86222
2 3.86222 -0.56222
3 3.86222 -0.46222
4 4.99556 -1.49556
5 3.86222 0.23778
6 4.24000 0.06000
7 3.86222 0.83778
8 4.24000 0.66000
9 4.24000 0.86000
10 5.37333 0.72667


Go to:

Examples of EXCEL Use
Statistics for Business and Economics WWW Page