Overview of Sample Data
Suppose a researcher collected the following data during a study to investigate computer anxiety in middle school children. The data were collected from 40 ninth graders in three different school systems. The information collected on each student is: identification number, gender, school system, previous computer experience, scores on a 10-item Likert type computer anxiety scale, scores on a 10-item Likert type mathematics anxiety scale, math scores for a given testing period, and computer test scores for the same testing period. With this information in hand the researcher wanted to write a SAS program to analyze data, both descriptive and inferential.
Let's look into various aspects of creating a SAS program for this data analysis. The first task is to present these data in an orderly form so the SAS software can read and analyze them. There are several variables involved in this research. In SAS Version 8, variables are named with 32 or fewer characters, but must begin with a letter. Let us name these variables according to SAS conventions:
- ID student identification number
- SEX gender of the student
- EXP previous computer experience in months/yrs
- SCHOOL name of school system
- C1 thru C10 10 scores on the computer anxiety scale
- M1 thru M10 10 scores on the math anxiety scale
- COMPSCOR computer test score for a given testing period
- MATHSCOR math score for the same testing period
Once the variables are named according to SAS conventions, the next task is to prepare a code book with details of the data layout. Following is a code book for the research in discussion.
VARIABLE NAME WIDTH COLUMNS VALUE LABELS ID 2 1-2 none SEX 1 1 M=male, F=female EXP 1 4 1=1 yr or less,2=2 yrs, 3=3 yrs SCHOOL 1 5 1=rural,2=city, 3=suburban C1 1 6 1=strongly agree, 2=agree, 3=undecided, 4=disagree, 5=agree C2 1 7 " C3 1 8 " C4 1 9 " C5 1 10 " C6 1 11 " C7 1 12 " C8 1 13 " C9 1 14 " C10 1 15 " M1 1 16 " M2 1 17 " M3 1 18 " M4 1 19 " M5 1 20 " M6 1 21 " M7 1 22 " M8 1 23 " M9 1 24 " M10 1 25 " MATHSCOR 2 26-27 COMPSCOR 2 28-29
In the above code book VARIABLE NAME stands for the name of the variable in the data, and WIDTH stands for the number of fields taken by each variable. For example, the variable ID takes a maximum of two fields/columns since the highest ID number is 40; EXP takes a maximum of 1 column/field. COLUMNS stands for the column number/s on a given line where a value for each variable can be found by SAS. VALUE LABELS means the value represented within a variable. For example, within the variable SEX, M represents male and F represents female students. Within the variable SCHOOL, 1, 2, 3 represent rural, city, and suburban schools, respectively.
Now let us examine how the data layout will look on a coding sheet or on a computer terminal. These information/variable values are being copied from questionnaires filled in by students. The variables are placed into appropriate columns based on the code book prepared earlier.
01M12123112245222113541213944 02F22325445211233445422212526 03F11211551141121122155114845
Note that on every line a given variable appears in the same column(s). For example, the variable SEX appears in column 3 of every line. In the above data no blank space is left between variables. You may choose to leave a blank space after each variable as:
01 M 1 2 1 2 3 1 1 2 2 4 5 2 2 2 1 1 3 5 4 1 2 1 39 44 02 F 2 2 3 2 5 4 4 5 2 1 1 2 3 3 4 4 5 4 2 2 2 1 25 26 03 F 1 1 2 1 1 5 5 1 1 4 1 1 2 1 1 2 2 1 5 5 1 1 48 45
Whichever style (format) you choose, as long as you convey the format correctly to SAS, it should not have any impact on the analysis. In the above layout there are only three lines of data where each line stands for an observation (information about each person). Note that each subject has only one line (record) of data. In another situation you may have more than one record per subject/observation.
Suppose these data are stored in a file in your directory under the name clas.dat. The data can be entered directly to a Unix environment using an editor (e.g., vi, emacs, pico) or can be typed onto a floppy diskette from a microcomputer and then uploaded to the Unix environment using FTP (File Transfer Protocol) or any other appropriate communications package.
Downloading Sample Data
If you are interested in obtaining a copy of this data file you may copy it from the Stat/Math website (http://www.indiana.edu/~statmath).
To obtain a copy of the sample files:
- Click Sample program file (http://www.indiana.edu/~statmath/stat/sas/CLAS.SAS) and follow the instruction into the pop-up window.
- Then click Sample data file (http://www.indiana.edu/~statmath/stat/sas/CLAS.DAT).
- Transfer these files to your Unix account.
Contact a UITS consultant if you need assistance.
Next: Writing a SAS Program: the DATA Step
Up: Table of Contents