This class deals with techniques referred to broadly as multivariate methods. We focuses on how these methods can be used to transform a set of related variables into a smaller number of more fundamental measures. This is sometimes referred to as "scaling". Examples of how these methods might be used include: multiple tests scores used to create a measure of ability; using variables for exposure to cultural events to create a scale of cultural capital; using questions about interactions with people having a mental illness to create a measure of social distance. Creating scales is often a critical first step in data analysis. Too often a simple summated scale, presented along with Crohnbach's alpha, is all that is done, possibly obscuring as much as it reveals. After reviewing methods such as multidimensional scaling, principal components, and cluster analysis, we focus on latent structure analysis (LSA). LSA includes exploratory factor analysis, confirmatory factor analysis, latent class analysis, item response models, and structural equation modeling. Assignment will involve exercises applying these models to real data. Feel free to email me at jslong at indiana dot edu with questions.
This web page serves as the syllabus for the class
· Policies and logistics · Assignments · Workflow · Computing and data · Getting help ·
· Books · Resources · FTP downloads · Getting ready ·
- Class meets Monday/Wednesday in Ballantine 205 from 9:30 to 10:45. January 12, 2014
- Class is full. November 12, 2013
- Enrollment begins this week. October 14, 2013
Policies and logistics
- Enrollment: Often there are more students who want to take S651 than there are seats in the class. First priority is given to students for whom this is required for their degree. Otherwise, authorizations is given on a first-come-first-serve basis. If you are interested in taking the class, contact Scott Long (jslong at indiana dot edu). If you are given an authorization, you need to sign up for the class during the normal enrollment period; if you do not, your authorization will be given to the next student on the wait list.
- Logistics: Lectures are Monday and Wednesday 9:30 - 10:45AM (Ballantine 205), with lab on Monday 3:30 - 5:30PM (BH 107). Please arrive on time. My office is Ballantine 842B which is directly across from the elevator. Enter 842 (no need to knock) and my office is at the end of the hall. If I am talking with someone, let me know you are waiting. Office hours are tentatively Monday after class and during lab. Other times are available by appointment. Feel free to contact me by e-mail. During the week if you don't hear within 12 hours, try again. 2014-01-12
- Time conflicts: If you have another class that
overlaps with the lecture or lab time, you will need to take the class another semester. Even though you will not need to be in lab every week, some weeks that time will be used for presentations that you should attend.
- Lecture notes: PDFs of lectures will be made available on the class LAN.
- Turning in assignments: Assignments are due at the start of class on the day they are due. Pedagogically it is critical to complete assignments on time. Late assignments are penalized 25%. If there are special circumstances, let me know and we'll figure out something.
- LAN access: Students in the registrar's enrollment list will be given R/W permission to the course LAN. If you enrolled late, contact sochelp at indiana dot edu (cc jslong at indiana dot edu) and request R/W access to the LAN for Soc 651. The batch file to access the LAN is here. Copies of all your work should be saved on the LAN, but be sure you have it backed up elsewhere.
- Box and Dropbox: Dropbox is a handy cloud utility for saving and sharing files. Indiana University now provides 50 gigabytes of free storage with Box. We might try using Box for sharing files.
- IU students can run Stata for free over the internet. For details go here. Opinions vary on how well this works.
- Mistakes and inconsistencies: If a mistake is made in grading, I apologize. Unfortunately, it sometimes happens. Return the assignment to me along with a cover page explaining the error. If I do not return the assignment documenting the change within two class periods, please remind me by e-mail..
An essential part of being an effective researcher is having a workflow to organize your efforts and allow others to reproduce your results. Since this is a class in applied data analysis, a portion of your grade is based on the workflow you use in completing your assignments. More general information and a detailed treatment of workflow is available at Long’s workflow page and his book The Workflow of Data Analysis Using Stata. For this class you are not required to implement the full workflow from the book, but must follow the following guidelines.
Requirements for Soc 651: Multivariate Analysis
- Keep a research log: A research log is a record of your progress similar to a journal. In you research log you should record your work on each assignment. An example is posted to the course LAN in "/jslong scott long" to show you what you log might look like. (Note that the research log is distinct from Stata log file that you create with log using dofilename, text replace)
- Use an organized file structure: Organizing files so you can find your work and know what has been finished is essential. The file structure you are to use in this class for your files on the LAN is illustrated in "/jslong scott long".
- “Post” files that are done and never change them: A fundamental task in workflow is keeping track of which work is finished and never changing that work. You can do new work, but you cannot change work that is completed. Posting files makes this simple. When work is completed it is moved from your work directory and posted to the appropriate assignment directory. Posted files should never be changed. If you need to re-do something that has been posted, create a new do-file that creates a new log file. Before you turn in an assignment, all associated files must be posted.
- Follow file naming conventions: Cheap storage makes accumulating massive numbers of files too easy. To keep track of these files you need standardized names. Files names should use the following form where you replace the bracketed values with the appropriate information:
[your userid]-651-a[assignment #]-[step #]-[task].[extension]
For example, my work for categorical data analysis might include:
This is the log file from Scott Long for Soc 651's assignment 2 for the first program for the assignment (01) in which you are creating variables (create-variables). Then,
is the second do-file for assignment 4 where running a LRM.
No two files should have the same name. If you post a file and need to make a change, the revised file should add a version number (e.g., jslong-651-a4-02v2-lrm-analysis.do) This way you have a full record of what you have done and never have to guess which file was the last one your used.
- Use legible and robust do-files and script files: You should be able to run do-files (or any other script file) on another computer without making any changes. At a minimum this means that you should never hard code directory/folder locations. Your do-files need to be clear and easy to understand. Look at the examples on the LAN and model your do-files after those. See the Workflow book for further details.
- Showing and printing Stata output: When you reproduce the results from a Stata log file you must show it in a fixed font (e.g., Courier not Times Roman), in a small enough font size so that the lines do not wrap. Courier 9 often works well.
Computing and data
- Downloading files: Files for the class will be on the LAN. Some will also be on the LAN.
- Getting Started with Stata: This document can help if you are unfamiliar with Stata or a bit rusty. Also, check the Stata Youtube station.
- Buying software: Personal copies of Stata can be purchased from the IU Stat/Math Center. Mplus has a student version that can be purchased here.
- Data sets: You will need to find data to use for your class project. ICPSR (google ICPSR) has many datasets that would work. If you are using data that you have not collected yourself, make sure that you have explicit permission to use the data for the class.
If you need help debugging a program, the best
thing is to place relevant files in your directory on the LAN in a subdirectory called \helpme (e.g., \jslong\helpme). Include the do-file, the dataset, and log file in
text format, not smcl. Please follow the guidelines below or it is much less likely that you will get a quick and helpful answer. For further details on getting help, check here.
- The script-file must be self-contained. It must load the data, create needed variables (if any), generate the
problem, and save a log file in text format. The do-file must have comments explaining what you are
doing and what the problem is.
- If a command is causing a problem,
include the command which command-name for the command causing the problem. This tells me which
version of the command you are using.
- Do not refer to specific directories (e.g., do not: use d:\mydata\science3.dta).
Assume that your data is located in your Stata working directory.
- Here is an example of what the do-file might look:
capture log close
log using jslong_assgn1_problem, text replace
// Scott Long - 2011-08-31
// Assignment 4: binary regression
// ERROR: see #3 below.
// #1: load data and check data
spex science2, clear
sum x1 x2
// #2: estimate logit
logit y xl x2, nolog
// #3: compute discrete change
// ERROR: variable xl not found
prchange, x(x1=1 x2=3)
Books and lecture notes
Lecture notes: PDFs will be put on the ftp site. Bring paper or electronic copies to class. Required.
Books: Instead of ordering books, I have included links to books at amazon.com or stata.com. I think this will give you the best price and let you pick the books you need for your work.
Required: Analysis of Multivariate Social Science Data, Second Edition by Bartholomew, Steele, Moustaki, and Galbraith. If you have the first edition, that should be fine (as long as you don't mind searching a bit when I refer to a specific page number). The 2nd edition adds multiple regression, a brief discussion of the SEM (structural equation model), and an introduction to multilevel models. (amazon.com)
Required: Latent Class and Latent Transition Analysis by Collins and Lanza. We will be using their software for LCA as well. (amazon.com)
Recommended: The workflow of data analysis. (WF at www.stata.com $52 plus shipping; amazon.com for $61). If several people order from Stata, shipping is much more reasonable. Highly recommended for you work in graduate school
Optional: Confirmatory Factor Analysis: A Preface to LISREL by Scott Long is an old but remarkable book on the confirmatory factor model. It has a great deal of detail on information on identification. (CFA at amazon.com $17 or $10 on Kindle) Optional.
Optional: Structural Equations with Latent Variables by Ken Bollen is the most comprehensive book available on the structural equation model treated in Chapter 11 of AMSSD. (SELV at amazon.com $125) Optional.