J. Scott Long - Indiana University
Department of Sociology | Department of Statistics
Home Teaching Research SPost Commands Workflow of Data AnalysisContact and vita Links Recommendations FTP downloads
Soc651 Multivariate Analysis of Social Science Data

Spring 2013


This class deals with techniques referred to broadly as multivariate methods. We focuses on how these methods can be used to transform a set of related variables into a smaller number of more fundamental measures. This is sometimes referred to as "scaling". Examples of how these methods might be used include: multiple tests scores used to create a measure of ability; using variables for exposure to cultural events to create a scale of cultural capital; using questions about interactions with people having a mental illness to create a measure of social distance. Creating scales is often a critical first step in data analysis. Too often a simple summated scale, presented along with Crohnbach's alpha, is all that is done, possibly obscuring as much as it reveals. Alternative methods that we examine include multidimensional scaling, principal components, cluster analysis, and especially variations of the factor model such as exploratory factor analysis, confirmatory factor analysis, latent class analysis, item response models, and SEM. The primary assignment is to apply these methods to data that is in your area of substantive interest. Ideally, you should have a dataset in mind before the class begins. Feel free to email me at jslong at indiana dot edu with questions.

Prerequisites: Students need a prior course on the linear regression model and a course on models for categorical outcomes.

This web page serves as the syllabus for the class

· Policies and logistics · Assignments · Workflow · Computing and data · Getting help ·

· Books · Resources · FTP downloads · Getting ready ·

News

  • 18Oct2012: preliminary web page for Spring 2013

Policies and logistics

  • Enrollment: Often there are more students who want to take S651 than there are eats in the class. First priority is given to students for whom this is required for their degree. Otherwise, authorizations is given on a first-come-first-serve basis. If you are interested in taking the class, contact Scott Long (jslong at indiana dot edu). If you are given an authorization, you need to sign up for the class during the normal enrollment period; if you do not, your authorization will be given to the next student on the wait list.
  • Time conflicts: If you have another class that overlaps with the lecture or lab time, you will need to take the class another semester. Even though you will not need to be in lab every week, you need to able to attend classes at that time when there is a lecture or presentation.
  • Logistics: Class meets Tu 1:25PM - 3:20PM in the Schuessler Institute Room 201 (location may change) and Fr 1:25PM - 3:20PM in Ballantine Hall (BH) 107. Please arrive on time. Friday is normally an open lab but will sometimes be used for lectures and presentations. My office is Ballantine 842B which is directly across from the elevator. Enter 842 (no need to knock) and my office is at the end of the hall. If I am talking with someone, let me know you are waiting. Office hours are tentatively Friday 1:25-3:20 during lab. Other times are available by appointment. Feel free to contact me by e-mail. During the week if you don't hear within 12 hours, try again. 18Oct2012
  • Turning in assignments: Assignments are in my mailbox in Ballantine 744 (or handed in during class) by 5PM on the day they are due. Pedagogically it is critical to complete assignments on time. Late assignments are penalized 25%. If there are special circumstances, let me know and we'll figure out something. 18Oct2012
  • CLASSPAK: There might be a ClassPak at the bookstore; if not pdf files will be available with the lecture materials. 18Oct2012
  • LAN access: Students in the registrar's enrollment list will be given R/W permission to the course LAN. If you enrolled late, contact sochelp at indiana dot edu (cc jslong at indiana dot edu) and request R/W access to the LAN for Soc 651. The batch file to access the LAN is here. Copies of all your work should be saved on the LAN, but be sure you have it backed up elsewhere. 18Oct2012
  • SugarSync and Dropbox: Dropbox is a handy cloud utility for saving and sharing files. Indiana University now provides 50 gigabytes of free storage with Box. We might try using Box for sharing files. 18Oct2012
  • IU students can run Stata for free over the internet. For details go here. Opinions vary on how well this works. 18Oct2012
  • Mistakes and inconsistencies: If a mistake is made in grading, I apologize. Unfortunately, it sometimes happens. Return the assignment to me along with a cover page explaining the error. If I do not return the assignment documenting the change within two class periods, please remind me by e-mail..

Assignments - assignments are by 5pm

Assignments in the form of Word files will be available here. Add your answers to the assignment file and place the renamed, completed file in your LAN directory.

Exercises

  1. Create a course directory for your work where all of you work is located and which utilizes WF principles for organizing and posting files.
  2. Keep a research log that records you work and facilitates replication. It should be understandable to someone else (like Scott) and refer to the files on your directory.
  3. Exercises exploring each method. These exercises are not base on your own data but are designed to help you understand each method. Results from these exercises are place on the lan and noted in your research log. If you prefer, you can have one log for your project and another for your explorations.

Class Project: Your project can involve the applications of the methods from the class to your data or a methodological study.

  1. An approved proposal/contract for your project.
  2. A time line for completing the project.
  3. Verification that you have permission to use the data and IRB approval as necessary.
  4. Presentation in class with related materials placed in your class directory.
  5. Final paper and/or project description turned in the last day of class and placed in your class directory.
  6. Research log that describes work for the course project.
  7. Meeting with Scott to discuss your project.

Workflow

An essential part of being an effective researcher is a workflow that allows you to organize your efforts and replicate your results. Since this is a class in applied data analysis, a portion of your grade is based on the workflow you use in completing your assignments and project. More general information and a detailed treatment of workflow is available at Long’s workflow page and his book The Workflow of Data Analysis Using Stata. For this class you are not required to implement the full workflow from the book, but you are encouraged to improve your workflow as your time and interest allow.

Requirements for Soc 651: Multivariate Analysis

  • Keep a research log: A research log is a record of your progress similar to a journal. In you research log you should record your work on each assignment. An example is posted to the course LAN to show you what you log might look like. (Note that the research log is distinct from Stata log file that you create with log using dofilename, text replace)
  • Use an organized file structure: Organizing files so you can find your work and know what has been finished is essential. The file structure you are to use in this class can be created using that batch file mv_workflow.bat. Copy this file to your course folder or directory and run it to create the directories for the course. Three types of directories are created: work directory, support directory, and assignments directory. The work directory is where your current work goes (e.g., assignments you have not finished). The support directory contains examples or other files that are not critical to completing your assignment but support your work. You are not graded on the organization of your support directory. Your assignments directory holds completed assignments. Your work on an assignment should be moved to this directory when it is completed (see #3 below).
  • “Post” files that are done and never change them: A fundamental task in workflow is keeping track of which work is finished and never changing that work. You can do new work, but you cannot change work that is completed. Posting files makes this simple. When work is completed it is moved from your work directory and posted to the appropriate assignment directory. Posted files should never be changed. If you need to re-do something that has been posted, create a new do-file that creates a new log file. Before you turn in an assignment, all associated files must be posted.
  • Follow file naming conventions: Cheap storage makes accumulating massive numbers of files too easy. To keep track of these files you need standardized names. Files names should use the following form where you replace the bracketed values with the appropriate information:
            [your userid]-651-a[assignment #]-[step #]-[task].[extension]
    For example, my work for categorical data analysis might include:
            jslong-651-a2-01-create-variables.log
    This is the log file from Scott Long for Soc 651's assignment 2 for the first program for the assignment (01) in which you are creating variables (create-variables). Then,
            jslong-651-a4-02-lrm-analysis.do
    is the second do-file for assignment 4 where running a LRM. No two files should have the same name. If you post a file and need to make a change, the revised file should add a version number (e.g., jslong-651-a4-02v2-lrm-analysis.do) This way you have a full record of what you have done and never have to guess which file was the last one your used.
  • Use legible and robust do-files and script files: You should be able to run do-files (or any other script file) on another computer without making any changes. At a minimum this means that you should never hard code directory/folder locations. Your do-files need to be clear and easy to understand. Look at the examples on the LAN and model your do-files after those. See the Workflow book for further details.
  • Showing and printing Stata output: When you reproduce the results from a Stata log file you must show it in a fixed font (e.g., Courier not Times Roman), in a small enough font size so that the lines do not wrap. Courier 9 often works well.

Computing and data

  • Downloading files: to download files, in Stata, connected to the internet, type: findit mviu. Click the link to what you need. Most examples from lectures will be located here, even if they are not Stata files. They will also be on the ftp site.
  • Data sets for class use: This document is great if you are unfamiliar with Stata or a bit rusty.
  • Getting Started with Stata: This document is great if you are unfamiliar with Stata or a bit rusty.
  • Buying software: Personal copies of Stata can be purchased from the IU Stat/Math Center. Mplus has a student version that can be purchased here.
  • Data sets: You will need to find data to use for your class project. ICPSR (google ICPSR) has many datasets that would work. If you are using data that you have not collected yourself, make sure that you have explicit permission to use the data for the class.

Getting help

If you need help debugging a program, the best thing is to place relevant files in your directory on the LAN in a subdirectory called \helpme (e.g., \jslong\helpme). Include the do-file, the dataset, and log file in text format, not smcl. Please follow the guidelines below or it is much less likely that you will get a quick and helpful answer. For further details on getting help, check here.

  • The script-file must be self-contained. It must load the data, create needed variables (if any), generate the problem, and save a log file in text format. The do-file must have comments explaining what you are doing and what the problem is.
  • If a command is causing a problem, include the command which command-name for the command causing the problem. This tells me which version of the command you are using.
  • Do not refer to specific directories (e.g., do not: use d:\mydata\science3.dta). Assume that your data is located in your Stata working directory.
  • Here is an example of what the do-file might look:

capture log close
log using jslong_assgn1_problem, text replace

// Scott Long - 2011-08-31
// Assignment 4: binary regression
// ERROR: see #3 below.

// #1: load data and check data
spex science2, clear
tab y
sum x1 x2

// #2: estimate logit
logit y xl x2, nolog

// #3: compute discrete change
// ERROR: variable xl not found
which prchange
prchange, x(x1=1 x2=3)

log close
exit

Books and lecture notes

ClassPak - be sure to bring this the first day of class. It includes lecture notes and reprints. Required.

Instead of ordering books, I have included links to books at amazon.com or stata.com. I think this will give you the best price and let you pick the books you need for your work.

AMSSD: Analysis of Multivariate Social Science Data, Second Edition by Bartholomew, Steele, Moustaki, and Galbraith is the primary book for the class. If you have the first edition, that should be fine (as long as you don't mind searching a bit when I refer to a specific page number). The 2nd edition adds multiple regression, a brief discussion of the SEM (structural equation model), and an introduction to multilevel models. (AMVSSD at amazon.com $52)

CFA: Confirmatory Factor Analysis: A Preface to LISREL by Scott Long is an old but remarkable book on the confirmatory factor model. It has a great deal of detail on information on identification. (CFA at amazon.com $17 or $10 on Kindle)

SELV: Structural Equations with Latent Variables by Ken Bollen is the most comprehensive book available on the structural equation model treated in Chapter 11 of AMSSD. (SELV at amazon.com $125)

WF: The workflow of data analysis. (WF at www.stata.com $52 plus shipping; amazon.com for $61). If several people order from Stata, shipping is much more reasonable.

Resources

The AMSSD site has datasets and sample chapters that you can download.

The Mplus site has a lot of extremely valuable information. The examples there along with the discussion formum is often more helpful than the official manual.

The Workflow site has supplementary information for the Workflow book.

Getting ready for Soc651

There are several things you can do to get ready for the class.

  1. Review a book on the linear regression model.
  2. Learn the Greek alphabet, upper and lower case.
  3. Review basic matrix algebra.
  4. Skim the entire AMSSD to get an overview of the models to be considered.
  5. Start cleaning the data you plan to use for the class.
  6. Begin writting you research log and plan for the class; see WF for details.
© 2013 J. Scott Long