Data Analysis
So far what we did was to look at SYSTAT to develop a basic idea on how SYSTAT for Windows works. The next step is to examine a few other data analysis procedures (e.g., correlation, regression, t-test) using SYSTAT for Windows. Only a limited number of procedures are discussed in this document. Refer to SYSTAT documents for further information.
Downloading Data
The data set we discussed in our earlier example was to get you started. Now we will examine another data set with more variables and cases, appropriate for the kind of analysis techniques we are examining.
In the example, you will import an ASCII data file, clas1.txt, created and saved in text format, into SYSTAT for Windows. The data collected from forty middle school students contains 28 variables. The first four variables (ID, SEX$, EXP, SCHOOL) are background variables. The Variable SEX$ has two levels (m=male, f=female). EXP (prior computer experience has three levels (1=less than one year, 2=1-2 years, 3=more than 2 years), SCHOOL (type of school system) has three levels (1=rural school, 2=suburban school, 3=urban school). The next 20 variables (C1...C10, M1...M10) are Likert type responses to a computer opinion survey, and mathematics anxiety survey. The last four variables (MATHSCOR, COMPSCOR, MANX, CANX) are scores on mathematics test, computer test, mathematics anxiety grouping, and computer opinion survey cumulative score. The variable MANX is a dichotomous variable created from low (coded as 0) and high (coded as 1) mathematics anxiety score.
To obtain a copy of this data file:
- Using a web browser (Netscape, Internate Explorer, lynx, etc.), download Sample SYSTAT Data.
- Save it to a file (for example, a:\clas1.txt).
Contact a STC consultant if you need assistance.
Import the file into SYSTAT using the method described earlier. The data will now be displayed on the Data window. Now you are ready for your data analysis.
Correlation Analysis
A correlation analysis is performed to quantify the strength of association between two numeric variables. In the following task we will perform a Pearson correlation analysis (SYSTAT can also perform a Spearman rank correlation). The variables used in the analysis are mathscor, compscor, and canx.

- From the Statistics menu select Correlations->Simple
- Highlight the variables, individually or collectively, MATHSCOR, COMPSCOR, and CANX, and click [Add-->]
- Selecting Options will open a Correlations: Options dialog box
- Check the Probabilities box and select Uncorrected
- Click Continue and finally OK
A symmetric matrix with the Pearson correlation as shown below will be displayed on the screen followed by another matrix with their probability values (p-values). The output (Quick Graph feature in Graph window) also includes a matrix of scatterplots (SPLOM) with one plot for each entry in the correlation matrix. Specify GRAPH=NONE in the command editor or select Options from the Edit menu and deselect Statistical Quickgraphs to suppress this feature.
Pearson correlation matrix
MATHSCOR COMPSCOR CANX
MATHSCOR 1.000
COMPSCOR 0.149 1.000
CANX 0.068 0.657 1.000
Bartlett Chi-square statistic: 21.909 df=3 Prob= 0.000
Matrix of Probabilities
MATHSCOR COMPSCOR CANX
MATHSCOR 0.000
COMPSCOR 0.359 0.000
CANX 0.676 0.000 0.000
Number of observations: 40
Linear Regression
A correlation coefficient tells you that some sort of relation exists between the variables, but it does not tell you much more than that. For example, a correlation of 1.0 means that all points fall exactly on a straight line, but it says nothing about the form of the relation between the variables. When the observations are not perfectly correlated, many different lines may be drawn through the data. To select a line that describes the data, as close as possible to the points, you employ the regression analysis technique which is based on the least-squares principles. In the following task you will perform a simple regression analysis with 'canx' as the dependent variable, and 'compscor' as the independent variable.
- From the Statistics menu select Regression->Linear
- Highlight CANX as Dependent: variable and click on Add -->
- Highlight COMPSCOR as Independent(s): and click on Add -->

- Click OK
The output, as shown below, will be displayed on the screen with regression statistics including slope, intercept, and squared multiple R. Quick Graph feature appears in Graph window and it includes a plot of regression residuals against the predicted values. Use the same procedure as earlier to suppress this feature.
Dep Var: CANX N: 40 Multiple R: 0.657 Squared multiple R: 0.432
Adjusted squared multiple R: 0.417 Standard error of estimate: 2.544
Effect Coefficient Std Error Std Coef Tolerance t P(2 Tail)
CONSTANT 23.194 1.315 0.0 . 17.634 0.000
COMPSCOR 0.133 0.025 0.657 1.000 5.375 0.000
Analysis of Variance
Source Sum-of-Squares DF Mean-Square F-Ratio P
Regression 186.922 1 186.922 28.891 0.000
Residual 245.853 38 6.470
------------------------------------------------------------------------------
Durbin-Watson D Statistic 1.103
First Order Autocorrelation 0.406

T-test
T-test is a data analysis procedure to test the hypothesis that two population means are equal. SYSTAT can compute both independent (unrelated groups) and dependent (related groups) t-tests. For independent t-tests, your grouping variable should have exactly two values (e.g., male/female, pass/fail). The grouping variable may either be numeric or character. If a grouping variable has more than two categories then you can use the Data/Select cases... menu to select the two values you want to perform t-test with. Once you select cases make sure you deselect it to restore the data set if you plan to use all the cases for subsequent data analysis.
In the following task we will perform an independent t-test. The dependent variables are mathscore, and compscor, and the independent (grouping) variable is manx.(If you do not select a grouping variable by default a paired t-test will be performed.)
- From the Statistics menu select t-test->Two-Groups

- Highlight the dependent variables MATHSCOR, and COMPSCOR, and click Add-->
- Highlight MANX for grouping variable and click Add-->
- Click OK
The output from the run will be displayed on the screen as shown below. Quick Graph feature in Graph window includes a combined display of three graphs (a boxplot, a normal curve and a dit-plot) for each group. Use the same procedure as earlier to suppress this feature.
Two-sample t test on MATHSCOR grouped by MANX
Group N Mean SD
0 28 53.750 13.845
1 12 37.000 15.214
Separate Variance t = 3.277 df = 19.2 Prob = 0.004
Difference in Means = 16.750 95.00% CI = 6.058 to 27.442
Pooled Variance t = 3.406 df = 38 Prob = 0.002
Difference in Means = 16.750 95.00% CI = 6.793 to 26.707

Two-sample t test on COMPSCOR grouped by MANX
Group N Mean SD
0 28 51.000 15.253
1 12 49.083 19.486
Separate Variance t = 0.303 df = 17.1 Prob = 0.765
Difference in Means = 1.917 95.00% CI = -11.416 to 15.249
Pooled Variance t = 0.335 df = 38 Prob = 0.740
Difference in Means = 1.917 95.00% CI = -9.671 to 13.505

Analysis of Variance
The statistical technique used to test the null hypothesis that several means are equal is called analysis of variance. It is called that because it examines the variability in the sample and, based on the variability, it determines whether there is reason to believe the population means are not equal. In analysis of variance, the observed variability in the sample is divided, or partitioned, into two parts: the variability of observations within a group (around the group mean), and variability between the group means. If the two estimates are substantially different, you can reject the null hypothesis that the population means are equal. The statistical test for null hypothesis that all of the groups have the same mean in the population is based on computing the ratio of the two estimates, called an F statistic. The observed significance level is obtained by comparing the calculated F value to the F distribution (the distribution of the F statistic when the null hypothesis is true).
A significant F value only tells you that the means are probably not all equal. It does not tell you which pairs of groups appear to have different means. To pinpoint exactly where the differences are, multiple comparisons may be performed.
In the following exercise you will perform an ANOVA with canx as the dependent variable and 'exp' as the factor variable. To perform a pairwise mean comparisons to identify which means differ from others a Tukey HSD test has been employed.
- From the Statistics menu select Analysis of Variance (ANOVA)/Estimate Model(Selecting General Linear Model/Estimate Model from Statistics menu will result in the same procedure.)

- Highlight CANX as Dependent(s): and click [Add-->]
- Highlight EXP as Factor(s): and click [Add-->]
- Select Post hoc Tests and choose Tukey from drop-down list. (Selecting General Linear Model/Pairwise Comparisons from Statistics menu will result in the same procedure plus Dunnett's test. But this option becomes active only after you run your ANOVA.)
- Click OK
The output as shown below will be displayed on the Main window. In the Graph window, Quick Graph includes a plot of residuals from each estimated cell mean versus the estimated cell mean.
Effects coding used for categorical variables in model.
Categorical values encountered during processing are:
EXP (3 levels)
1, 2, 3
Dep Var: CANX N: 40 Multiple R: 0.406 Squared multiple R: 0.165
Analysis of Variance
Source Sum-of-Squares df Mean-Square F-ratio P
EXP 71.466 2 35.733 3.659 0.035
Error 361.309 37 9.765
------------------------------------------------------------------------------

------------------------------------------------------------------------------
*** WARNING ***
Case 39 is an outlier (Studentized Residual = 3.385)
Durbin-Watson D Statistic 1.769
First Order Autocorrelation 0.043
COL/
ROW EXP
1 1
2 2
3 3
Using least squares means.
Post Hoc test of CANX
------------------------------------------------------------------------------
Using model MSE of 9.765 with 37 df.
Matrix of pairwise mean differences:
1 2 3
1 0.000
2 2.800 0.000
3 2.709 -0.091 0.000
Tukey HSD Multiple Comparisons.
Matrix of pairwise comparison probabilities:
1 2 3
1 1.000
2 0.054 1.000
3 0.087 0.997 1.000
------------------------------------------------------------------------------
The output shows that there is a significant difference among groups with different levels of computer experience at least at .05 probability level.
The output for pairwise comparisons include a table of mean differences and another table of probabilities. To determine significant differences, examine the pairs and their probability level. From the output it is evident that there is a marginally significant difference between group 1 (exp=1) and group 2 (exp=2). None of the other combinations produced a significant difference at least at the 0.05 level.
Using SYSTAT's Graph Menu
SYSTAT provides a wide selection of graphics for every stage of your project: exploration, research, and presentation. The graphics capabilities of SYSTAT include:
- histograms with curve fitting
- bar graphs, box plots, stem-and-leaf diagrams, pie charts
- 3-D rotation, maps with geographic projections
- mathematical function plots, log and power scales
- confidence intervals, ellipses, and centroids
- contour plots, control charts
- case coding of labels and symbols
- linear, quadratic, step, spline, polynomial, LOWESS, exponential, and DWLS smoothing in two and three dimensions
- rectangular, spherical, polar, cylindrical, and triangular coordinates, perspective depth and projections.
Plotting Two Variables with SYSTAT
Looking at a plot is one of the best ways to examine relationships and patterns. For example, a scatterplot allows the visual representation of two separate distributions on a single diagram.
In the following task you will plot the variables CANX (dependent variable) by COMPSCOR (independent variable). We will also fit the data points on the scatterplot based on the least-squares principle.
- From Graph menu (Main window) select Plots and then choose Scatterplot

- Highlight COMPSCOR as X-variable: and click Add -->
- Highlight CANX as Y-variable(s): and click Add -->
- Choose Linear as smoother method from Options/Smoother and click Continue
- Click OK
The plot, as shown below, will be displayed on the screen

To print the graphics output, select File/Print..., and respond to the queries appear in subsequent dialog boxes. You may save your graphics streams to a file using File/Save as... To remove the Graphics window select File/Close Window.
Detailed discussion of all the graphics capabilities of SYSTAT is not possible through this document. You may refer to SYSTAT's Graphics document for learning more about the graphics capability of SYSTAT.
Next: Further Reading
Prev: Getting Started
Up: Table of Contents



