Data Analysis
Organizing your data for analysis
Suppose you have three test scores collected from a class of ten students (5 females, and 5 males) during a semester. The information you have for each student is: identification number, gender, score for test one, test two and test three (the full data set is displayed toward the end of this section for you to enter into the Data window).
Your first task is to present the data in a form acceptable to SYSTAT for processing. Before showing you how to enter the data, let us look at what SYSTAT accepts as data.
SYSTAT uses data organized in rows and columns. The rows are called cases, and the columns are called variables.
| Name$ | Test1 | Test2 | Test3 |
|---|---|---|---|
| Tim | 20 | 23 | 24 |
| Hans | 21 | 26 | 28 |
A case contains information for one unit of analysis, e.g., a person, an animal, a product, a business. Variables are information collected for each case, such as name, score, age, etc. In the above example there are two cases and four variables. When data are arranged in rows and columns like this and is stored in a file, it is called a 'cases-by variables' or 'rectangular data file.'
SYSTAT accepts numbers and characters as data. In the above example, Test1 is a numeric variable, and Name$ is a character variable. The numbers stored in numeric variables can have up to 15 digits. You can use a negative sign (-) for negative numbers. A character value stored in character variables can have up to 12 characters and can include letters (a-z, A-Z), or any typewriter characters (e.g., ()$&*/ ). If you include numbers within character values they will be treated as characters. Also, upper and lower case character values are differentiated (e.g., JUNIOR is not the same as junior).
You must assign a unique name for each variable. Variable names may contain up to 12 letters or numbers, and must begin with a letter. The names of character variables must end with a dollar sign ($); the dollar sign counts as one of the 12 letters. Variable names, unlike character values, are not case sensitive. Once you have entered a variable name, you may change it, but you cannot change its type from character to numeric or vice versa. If you forget to put a $ sign at the end of a character variable name, you must type the correct name as a new variable in a new column and later delete the incorrect variable.
When a numeric data is missing, enter a period (.) to flag the position where the value is missing. When a character data is missing enter a blank space surrrounded by single or double quotation marks (' 'or " "). These quotation marks will not show up in the spread sheet. Note that arithmetic involving missing values propagates missing values.
There is no limit to the number of variables or cases your worksheet can contain, but it is limited to the availability of memory or space on the computer.
With the above information in mind, let us assign names to the variables in our example: id, sex, test1, test2, and test3. Preparing a codebook with various details of your data (e.g., variable name, variable type, variable labels, etc.) is a good practice to organize your data before entering into SYSTAT Worksheet. However, preparing a code book is not mandatory for data analysis. A code book, as shown below, becomes handy especially when you deal with data sets involving several variables with differing variable types, and variable lengths. A sample code book for the data in discussion is shown below for illustration purposes.
Var. name var. type var.length var.labels id numeric 2 identification number sex$ character 1 student gender (f, m) test1 numeric 2 test one score test2 numeric 2 test two score test3 numeric 2 test three score
In our example, the only character variable is sex (note the $ sign at the end of the name). All the other variables are numeric. Character variables cannot be used in some of the statistical procedures (e.g., ANOVA, REGRESSION, MANOVA).
Entering Data into SYSTAT Data Window
Let us open a Worksheet and enter the variable names and the data provided.
- Select NewData from the File menu. A blank data window opens.
- Double-click (VAR00001) to open the Variable Properties dialog box.
- Type ID for the variable name, select Numeric as the variable type, and set Decimal places to 0.
- Click OK.
Now, let us create the string variable, SEX$.
- Double-click (VAR00002) to open the Variable Properties dialog box.
- Type SEX$ for the variable name and select String as the variable type.
- Click OK.
Define the remaining three numeric variables, TEST1, TEST2, and TEST3, the same way the variable id was defined.
Now, the top row now should have five variable names. Let us key in the data.
- Move the pointer to the cell immediately below the variable ID, by clicking it.
- Type in the data given below, pressing [Enter] or [Tab] after each value. To move down columns, press the down arrow.
If the NumLock is on, you can use the keypad to type in numbers. Once you finish entering the data your Worksheet will appear as shown below.

Creating a New Variable
Let us create a new variable, total, to represent the mean score from the three tests for each student he/she took during the semester.
- From the Data menu (Data window) select TransformLet
In the box on the left side of the screen, under the title variable, type in the new variable name total. Select Function Type: Multivariable. Now use the tab key to move the pointer to the box on the right side of the screen, under the title variable or expression, type in the expression: avg(test1 test2 test3)(note that this task can be performed by pointing the mouse to the function box and selecting the 'AVG( )' function and then highlighting and selecting the three variables.)
- Click OK
The variable TOTAL appears as the last column on the Worksheet.
Saving the Data File
Now that you have entered the data into the Worksheet, save it on to a diskette on drive A (those who are working from a personally owned PC may want to save it to a hard drive).
Now to save the data file:
- Select Save As... from the File menu.
- Type a:\grade1.SYD as filename (specify appropriate pathname if you are using alternate location to store the file)
- Click Save
Reading a Data File in Text Format
In some situations you may have a data file created using other software applications (e.g., Lotus, Excel), a data editor or a word processor. In such a situation you do not have to enter your data again into the Worksheet. You can import such a file into SYSTAT. SYSTAT can read ASCII data files in free-format (each value is separated by spaces, commas, or tabs) or fixed-format (each value appears in the same place in every case).
Suppose the data file we discussed above is stored on drive A as an ASCII (text) file, grade.txt, as follows:
id sex$ test1 test2 test3
01 f 83 85 91
02 f 65 72 68
03 m 60 74 64
04 m 88 96 92
05 m 84 79 82
06 f 90 94 90
07 f 87 80 82
08 f 78 86 80
09 m 90 87 93
10 m 76 73 70
To import the data into SYSTAT:
- From the File menu select OpenData

- Select ASCII text from the list of file types located at the bottom corner of the dialog box.
- Type in a:\grade.txt as file name.
- The contents of the data file will be displayed in the Data window by clicking Open.
- Select Save As... from the File menu in the Data window. Type in a:\grade1.SYD as filename for the file to be saved in SYSTAT format.
- Click Save
To control for the number of decimal places displayed for the imported file, Double-click variable to open the Variable Properties dialog box and set Decimal places to 0.. If you do not give the columns names, Systat will automatically generate names like VAR(1), VAR(2), etc. You may rename, add/delete variables, cases to/from an existing SYSTAT file. Earlier we showed how to create a new variable, total, as the mean value for the three test scores. Follow the same procedure to create the new variable and save your new data set.
Similarly, if you have files created using spread-sheet software or SPSS system files you can import them to SYSTAT by choosing the appropriate file type from Open a File dialog box.
Descriptive Statistics
Now that we have created a data set, let us run a few descriptive statistics on the variables in the data set. (The worksheet will still be displayed in the Data window. If not, use the File menu to open the data set, grade1.SYD, you saved.)
Of the variables in our data set, SEX$ is a categorical variable, and TEST1, TEST2, TEST3 and TOTAL are continuous variables. We will use the One-way (Statistics/Tables/Crosstabs/One-way) command to obtain frequency counts for the variable SEX$, and Basic Statistics (Statistics/Descriptive Statistics/Basic Statistics) command to obtain descriptive statistics
To run the One-way procedure:
- Select Tables from the Statistics menu
- Select Crosstabs from the Tables menu
- Select One-way... from the Crosstabs menu
A dialog box titled One-way Frequency Tables appears.

- Highlight the variable SEX$
- Click the Add--> button.
The variable appears in the box on the right. In this situation we are using a single variable to create a one-way table for illustration. However, you may add a column variable to create a two-way table by selecting Two-way from the Crosstabs menu. SYSTAT can also create multiway tables.
- Choose Frequency from the list of options provided at the bottom of the dialog box by clicking on the box left to it. An sign appears on any item selected.
- To deselect an item click on the box with the sign. Deselect Pearson chi-square.
- Click OK
The output appears as shown below.
Frequencies
Values for SEX$
f m Total
+-------------+
| 5 5 | 10
+-------------+
For a standard layout of one-way table select List layout from the One-way Frequency Tables dialog box.
To run the Basic Statistics procedure:
- From the Statistics menu select Descriptive Statistics
- From the Descriptive Statistics menu select Basic Statistics...

- Choose Mean, Median, SD, and N
- Select the variables TEST1, TEST2, TEST3, and TOTAL individually or collectively by holding down the control key.
- Click Add--> after selecting each variable or variables. The variables appear in the box on the right.
- Click OK
The output from the commands you just executed appears on the screen as shown below. If necessary, use the slide bar on the right side of the window to scroll down or up to view all of the output. For controlling the number of decimal points in the output you may select Options from the Edit menu (Main window) and choose the number of decimal places desired under the Data/Output Format, and Click OK. Now onwards the results will have the number of decimals you selected.
TEST1 TEST2 TEST3 TOTAL N of cases 10 10 10 10 Median 83.500 82.500 82.000 82.333 Mean 80.100 82.600 81.200 81.300 Standard Dev 10.450 8.462 10.685 9.371
You may select a grouping variable to obtain separate analysis for individual groups. Suppose you want to obtain separate listing of the above analysis for males and females. The first task is to sort the file based on the grouping variable, e.g. sex, with all the females listed first followed by males (string variables are sorted in alphabetical order).
- From the Main window or Data window select Data menu
- From the Data menu select Sort

- From the variable list select SEX$ and click Add-->
- Click OK
The sorted file will be displayed on the screen. The sorted file can be saved by selecting Save file from the Sort dialog box and typing in a:\grade2.SYD as filename. (If you need to retrieve the original file, select Open from the File menu and type in a:\grade1.SYD).
The next step is to select By groups... from the Data menu for separate data analysis for males and females.
- Select By groups... from the Data menu from Main or Data window
- Select SEX$ from the variable list and click Add -->
(If you have missing values and exclude these values from analysis, you can check the box in front of Exclude missing. In our example there were no missing values.)
- Click OK
Now run the Statistics procedure.
- From the Statistics menu select Descriptive Statistics/Basic Statistics...
- Choose Mean, Median, SD, and N
- Highlight the variables TEST1, TEST2, TEST3, and TOTAL and click Add -->
- Click OK
The output from the commands you just executed will appear on the screen as shown below.
The following results are for:
SEX$ = f
TEST1 TEST2 TEST3 TOTAL
N of cases 5 5 5 5
Median 83.000 85.000 82.000 83.000
Mean 80.600 83.400 82.200 82.067
Standard Dev 9.813 8.112 9.284 8.575
The following results are for:
SEX$ = m
TEST1 TEST2 TEST3 TOTAL
N of cases 5 5 5 5
Median 84.000 79.000 82.000 81.667
Mean 79.600 81.800 80.200 80.533
Standard Dev 12.198 9.680 12.969 11.072
Printing Output
Once you are satisfied with your analysis you may want to obtain a hard copy of the output. You may print the entire output, or selectively delete the unwanted portion using the Edit menu and print the information you want, or highlight the part you want and then print it. You may save (File/Save as...) the output to a diskette for printing later. To print the output:
- From the File menu select Print...
- Click OK at the print menu.
The contents of the Window will now be printed.
Next: Data Analysis
Prev: Orientation
Up: Table of Contents



