Stat/Math
Software Support
Software Consulting
Software Availability
Software Price
Contact

User Support
Documentation
Knowledge Base
Education
Consulting
Podcasts

Systems & Services
Cyberinfrastructure
Supercomputers
Grid Computing
Storage
Visualization
Digital Libraries & Data

Results & Impact
Publications
Grants & Grant Info
Events & Outreach
Economic Impact
Survey Results

Vision & Planning
News & Features

## Data Analysis

So far what we did was to look at MINITAB to develop a basic idea on how the program works. The next step is to examine a few other data analysis techniques (e.g., correlation, regression, t-test, analysis of variance) which you might employ for your own data analysis. Note that only a limited number of procedures are discussed in this document for illustration. Refer to the documentation for further information.

The data set we discussed in our earlier example was to get you started. Now we will examine another data set with more variables and cases, which is more appropriate for the kind of analysis techniques we are examining.

In this example, you will read an ASCII data file, clas.dat, created with a word processor and saved as a text file, into the MINITAB session. The data collected is from 40 middle school students and has 28 variables. The first four variables ( id, sex, exp, school) are background variables. The variable sex has two levels (M=male, F=female). Exp (prior computer experience) has three levels (1=less than one year, 2=1-2 years, 3=more than 2 years), school (type of school system) has three levels (1=rural school, 2=suburban school, 3=urban school). The next 20 variables (comp1...comp10, math1...math10) are Likert-type responses to a computer opinion survey, and mathematics anxiety survey. The last four variables (mathscor, compscor, manx, canx) are scores on the math test, computer test, math anxiety grouping, and computer opinion survey score. The variable manx is a dichotomous variable created from low (coded as 0) and high (coded as 1) mathematics anxiety score.

To obtain a copy of this data file:

Contact a STC consultant if you need assistance.

Let us assume that the data file, clas1.dat, is in c:\temp. The first task is to import the file stored in text format into MINITAB. Start your MINITAB session (if not already started) following the directions given earlier. At this point the fastest way to read this data into MINITAB is using the Session Window. From the sessions window, at the MTB> prompt, type:

```  read c1-c28;
file "c:\temp\clas1.dat";
format(f2.0,a1,22f1.0,2f2.0,f1.0,f2.0).
```

(To turn on the MTB> command prompt, click on the Session window and select Editor > Enable Command Language)

In the above command lines, the semicolon indicates that a subcommand will follow on the next line (the subcommand format indicates how the data file is to be read using the fortran format). If a line has to be continued with subsequent lines then use an ampersand (&) at the end of each line. When the job is executed, the data will appear in the worksheet filling 40 rows (cases) and 28 columns (variables). The columns will be numbered and no names will be assigned. You may type in names into the Data window, or use the Session window to assign names to the variables.

However, the above command lines can be saved into a text file that may be executed during a MINITAB session. Suppose the following MINITAB command lines are saved into a file, clas1.mtb, on your Desktop. Note that we decided to name each variable for clarity sake.

```  read c1-c28;
file "c:\temp\clas1.dat";
format(f2.0,a1,22f1.0,2f2.0,f1.0,f2.0).
name c1 'id' c2 'sex' c3 'exp' c4 'school' c5 'comp1' c6 'comp2' c7 'comp3' &
c8 'comp4'
name c9 'comp5' c10 'comp6' c11 'comp7' c12 'comp8' c13 'comp9' c14 'comp10'
name c15 'math1' c16 'math2' c17 'math3' c18 'math4' c19 'math5' c20 'math6'
name c21 'math7' c22 'math8' c23 'math9' c24 'math10' c25 'mathscor'
name c26 'compscor' c27 'manx' c28 'canx'.
```

To execute the file, clas1.mtb, do the following:

• Select File > Other Files > Run an Exec.... A dialog box titled Run an Exec appears. The default value for the number of execution will be given as 1. Do not change it.
• Click Select File button. Another dialog box with the same title appears.
• Click on the Desktop icon, then click on the file clas1.mtb.
• Click Open.

The data will be read into the worksheet. Each variable will be assigned a name as directed. Now save the file by taking the following steps.

• Select Save Current Worksheet As... from the File menu. A dialog box appears.
• Type c:\temp\clas1.mtw as filename (specify appropriate pathname if you are using an alternate location to store the file).
• Click Save.

The copy of the file will now be saved in MINITAB format. However, the data will still be displayed in the worksheet. Now you are ready for further analysis.

### Correlation analysis

A correlation analysis is performed to quantify the strength of association between two numeric variables. In the following task, we will perform Pearson correlation analysis. The variables used in the analysis are mathscor, compscor, and canx.

• Select Stat > Basic Statistics > Correlation.... This opens the Correlation dialog box. The numeric variables in your data file appear on the source list in a box on the left side of the screen.
• Select mathscor, compscor, and canx from the list and click the Select button. The variables will be pasted into the selection box.
• Click OK.

A matrix as shown below appears on the screen.

MINITAB could be used to calculate Spearman's Rho (rank correlation coefficient) between pairs of non-missing data by ranking both columns (use Manip > Rank) and correlating the ranked data using Stat > Basic Statistics > Correlation. It is important to delete rows that contain missing values before ranking the data.

### Simple linear regression

A correlation coefficient tells you that some sort of relation exists between the variables, but does not tell you much more. For example, a correlation of 1.0 means that all points fall exactly on a straight line, but it says nothing about the form of the relation between the variables. When the observations are not perfectly correlated, many different lines may be drawn through the data. To select a line that describes the data, as close as possible to the points, you employ the Regression Analysis, which is based on the least-squares principle. In the following task, you will perform a simple linear regression analysis with canx as the dependent variable, and compscor as independent variable.

• Select Stat > Regression > Regression.... The Regression dialog box appears.
• Choose canx as the response (dependent) variable.
• Choose compscor as the predictor (independent) variable.
• Click OK.

The output as shown below, will be displayed on the screen with regression statistics including slope, intercept, and squared multiple R. If you want to store additional regression information choose the appropriate options from under Storage.

### T-test

T-test is a data analysis procedure to test the hypothesis that two population means are equal. MINITAB can compute independent (not related; 2-sample) and dependent (related; 1-sample, 2-paired-sample) t-tests. For independent t-tests, you must have a grouping variable with exactly two values (e.g., male and female, pass and fail). The response (dependent) variable must be numeric. If a grouping variable has more than two categories then you can subset the two categories you want and create new columns with the selections to perform t-test with.

In the following task, we will perform an independent t-test. The dependent variable is mathscor, and the independent (grouping) variable is manx.

• Select Stat > Basic Statistics > 2-sample t.... A 2-Sample t dialog box appears. Samples in one column is already selected for you. If not select it.
• Select or type in mathscor in the box right to Samples:
• Select or type in manx in the box right to subscripts:
• Click OK.

By default a 2-tail test will be performed unless you change it by clicking the arrow next to the Alternate: and choosing another option after you click option in 2-Sample t dialog box. In our example we are assuming unequal variance.

The output given below will be displayed on the screen.

### Analysis of variance

The statistical technique used to test the null hypothesis that several population means are equal is called analysis of variance. It is called that because it examines the variability in the sample and, based on the variability, it determines whether there is a reason to believe the population means are not equal. The statistical test for the null hypothesis that all of the groups have the same mean in the population is based on computing the ratio of within and between group variability estimates, called F statistic. A significant F value only tells you that the population means are probably not all equal. It does not tell you which pairs of groups appear to have different means. To pinpoint exactly where the differences are, multiple comparisons may be performed.

In the following exercise you will perform a oneway ANOVA with canx as the dependent variable and exp as the factor variable.

• Select Stat > ANOVA > One-way...
• Select canx as the response variable.
• Select exp as the factor variable.
• Click comparisons... button.
• Select Tukey's family error rate:
• Click OK.
• Click OK.

The output given below will be displayed on the screen. The output shows that there is a significant difference among groups with different levels of computer experience at least at the .05 level. The multiple comparisons performed indicates that none of the differences are statistically significant at the .05 level. Examine the output and see the 3 intervals: (-0.033, 5.633), (-0.317, 5.735), and (-3.162, 2.981) all of which contains zero, and therefore none of them significant at the .05 level.

MINITAB provides a wide selection of graphics for every stage of your project: exploration, research, and presentation.

#### Plotting 2 variables using the Graph menu

Looking at plots is one of the best ways to examine relationships and patterns. For example, a scatterplot allows the visual representation of the relationship between variables diagram.

In the following task, you will plot variables canx (response variable) by compscor (predictor variable). We also fit a line to the data points on the scatterplot based on the least-squares principle.

• Select Graph > Scatterplot...

A Scatterplots dialog box appears.

• Click on the graph under With Regression.
• Click OK

A Scatterplot - With Regression dialog box appears. In the box below the Graph Variables box:

• Type canx as the first Y variable.
• Type compscor as the first X variable.
• Click OK.

The plot with the best-fit line appears in a separate graph window as shown below.

The above plot can also be created using regression procedure without going into Graph menu. Do the following to plot a fitted regression line using the Stat menu.

• Select Stat > Regression > Fitted Line Plot...

A Fitted Line Plot dialog window appears.

• Select canx as response variable.
• Select compscor as predictor variable.
• Click OK.

The fitted regression line with fitted model and R-square appears on the screen. You may use the %fitline macro to perform the same analysis. Refer to MINITAB Reference Manual for details.

To print the plot from the graph window:

• Select File > Print Graph...
• Click OK.

The plot will now be printed.