Stat503 / Soc650 is a second course in in applied
statistics. The prerequisite is a course in regression models for continuous dependent
variables such as Soc 554 or Stat 501. Categorical Data Analysis deals with
regression models in which the dependent variable is categorical: binary, nominal, ordinal, and count.
Models that are discussed include probit and logit for binary outcomes, ordered logit and ordered probit for ordinal outcomes, multinomial logit for nominal outcomes, and Poisson regression and related models for count outcomes. This web page serves as the syllabus for the class.
Assignment due dates · Workflow requirements · Grades and getting an A+ · Books
Computing and datasets · Getting Help · Enrolling, getting ready & time conflicts · FTP ·
News and policies
- Methods articles and Exemplars: have been placed on the class LAN (not the FTP site). 2013-09-17
- Logistics: Class meets 1:00 - 2:15 TR in BH219. Please arrive on time and be ready to start at 1:00.
- Teaching Assistants: The AIs are Tom VanHeuvelen (tvanheuv at indiana dot edu); Trent Mize (tdmize at indiana dot edu); and Ian Anson (iganson at indiana dot edu). Tom and Trent will be be in the computing labs; Ian will have office hours Wednesdays from 9-12am in Woodburn Hall 408. 2013-09-04
- Computing labs: You have signed up for one of two labs sections held in TuTh 2:30PM - 4:30PM Lindley Hall (LH) 025 and TuTh 5:30PM - 7:30PM Lindley Hall (LH) 023. Each section meets twice a week for two hours. The lab instructors might provide a short presentation or discuss the assignments at the start of each lab. Teaching assistants will be available for 90 minutes each day, but might not be available the last 30 minutes of lab. 2013-08-22
- Office hours: Office hours are TR 11:00-12:45 in Ballantine 842B which is directly across from the elevator. Enter 842 (no need to knock) and my office is at the end of the hall. If I'm talking with someone, please let me know you are waiting. Feel free to contact me by e-mail; during the week if you don't hear within 12 hours, try again. 2013-08-22
- Course materials: Course materials are located at http://www.indiana.edu/~jslsoc/ftp/CDAiu2013/. Lecture notes are posted here. 2013-08-22
- Turning in assignments: Assignments are due 30 minutes before the end of your lab on the due date. Pedagogically it is critical to complete assignments on time. The penalty for late assignments is 25%. Exceptions are made for special circumstances. Get e-mail approval from Professor Long; turn in the e-mail approval along with your late assignment. 2013-08-22
- LAN: You will be given R/W permission to the course LAN. If you enrolled late, contact sochelp at indiana dot edu (cc jslong at indiana dot edu) and request R/W access to the LAN for Stat 503/Soc 650. The ftp site has information on connecting to the LAN. All of your work must be placed in your folder before assignments are turned in. 2013-08-22
- SPost13: This is new work by Jeremy Freese and Scott Long. It is much more powerful than SPost9, but it is still under development. After you install the commands, you can run the command spost13update to update the programs. You should do this each day you use Stata!
- IU students can run Stata for free over the internet. For details go here. Opinions vary on how well this works. 2012-08-17
Due dates - assignments are due 30 minutes before lab ends
Assignments in the form of Word files will be available at the FTP site. Add your answers to the assignment file and place the renamed, completed file in your LAN directory. Details are provided in lab.
- Assignment 1: Math review. Due 3rd day of class, Sep 3.
- Assignment 2: Picking your variables. Due 4th day of class, Sep 5.
- Assignment 3: Data cleaning. Due sixth day of class, Sep 12.
- Assignment 4: LRM. Due tentatively ninth day of class, Sep. 24.
- Assignment 5: BRM-1.
Due tentatively 12th day of class, Oct. 3.
- Assignment 6: BRM-2. Due tentatively 16th day of class, Oct. 17.
- Assignment 7: Testing and Fit. Due tentatively 18th day of class, Oct. 24.
- Assignment 8: MNLM-1. Due tentatively day 21st day of class, Nov 5.
- Assignment 9: MNLM-2. Due tentatively day 24st day of class, Nov 14.
- Assignment 9: ORM. Due tentatively day 28st day of class, Nov.
- Assignment 10: Count models. Due 5pm on T of exam week, Dec 17.
Workflow requirements for assignments
An essential part of being an effective researcher is a workflow that allows you to organize your efforts and later replicate the results you have already completed. Since this class is an applied course in data analysis, a portion of your grade is based on the workflow you use in completing your assignments. More general information and a detailed treatment of workflow is available at Long’s workflow page and his book The Workflow of Data Analysis Using Stata. For this class you are not required to implement the full workflow from the book, but you are encouraged to improve your workflow as your time and interest allow. The workflow folder on the ftp site contins critical information that you must read and apply to your work in the class.
Grades and getting an A+
- Overview: Grades are based on computer assignments, your research log, non-computer assignments, attendance, and in-class assignments. Each of these assignments is given a number of points, for a total of about 900 points. The grade for this work is determined by adding up the points and dividing by the total possible, then: A=100%-94%; A-=93%-91%; B+=90-88%=B+; B=87%-84%=B; B-=83%-81%; etc.
- Mistakes and inconsistencies: If a mistake is made in grading, I apologize. Return the assignment to me along with a cover page explaining the error. If I do not return the assignment documenting the change within two class periods, please remind me by e-mail. Multiple people are doing the grading and we try to be consistent, but the is bound to be some variation (as in submitting a paper to a journal for review!).
- Getting an A+: To get an A+ you must do a project as well as receive an A on the required assignments (a really great project might lead to an A+ if your other work is an A-). You must get Professor Long's approval for your project and meet with him periodically. The final project is due the first day of finals. A careful write-up of your results along with the supporting Stata files and research log are required. The A+ project requires you to replicate a published articles using multinomial logit or ordinal logit. Using the original data obtained from the author or ICPSR, reproduce all or part of the analysis. Show how the author obtained the results. Show what you would do to improve the analysis. .
Computing and datasets
- Getting Started with Stata and CDA: Lab Guide for Stata are available on the FTP site. I strongly recommend that you work through the section of the guide corresponding to the current assignment before you start the assignment, even if you are sure you don't need to!
- Datasets: Datasets and codebooks you can use for the class are located on the ftp site. These datasets have not been "cleaned" and sometimes the codebook doesn't clearly describe what a variable is, just like real-world data.
- Stata: While you may freely use my ado files, they require Stata to run. Stata is installed in campus computing labs. Personal copies can be purchased from the IU Stat/Math Center.
If you need help debugging a program, the best
thing is to place relevant files in your directory on the LAN in a subdirectory called \helpme (e.g., \jslong\helpme). Include the do-file, the dataset, and log file in
text format, not smcl. Please follow the guidelines below or it is much less likely that you will get a quick and helpful answer. For further details on getting help, check here.
1) The do-file must be self-contained. It must load the data, create needed variables (if any), generate the
problem, and save a log file in text format. The do-file must have comments explaining what you are
doing and what the problem is.
2) If a SPost command is causing a problem,
include the command which command-name for the command causing the problem. This tells me which
version of the command you are using.
3) Do not refer to specific directories (e.g., do not: use d:\mydata\science3.dta).
Assume that your data is located in your Stata working directory.
Here is an example of what the do-file might look:
capture log close
log using jslong_assgn1_problem, text replace
// Scott Long - 2011-08-31
// Assignment 4: binary regression
// ERROR: see #3 below.
// #1: load data and check data
spex science2, clear
sum x1 x2
// #2: estimate logit
logit y xl x2, nolog
// #3: compute discrete change
// ERROR: variable xl not found
mchange, at(x1=1 x2=3)
- The lecture notes are available as PDF files on the ftp site.
- Long, J. Scott and Freese, Jeremy. 2005.
Regression Models for Categorical Dependent Variables Using Stata, 2nd
Edition. Stata Press: College Stata, TX. If you have the
“revised” edition, you do not need to buy the 2nd edition. Recommended but not required.
- Long, J.S. 2008, The Workflow of Data Analysis Using Stata. Stata Press: College Station, TX. If you plan to do a lot of data analysis, this book will save you a lot of time and make your work replicable. Recommended but not required.
- Long, J. Scott. 1997. Regression Models for
Categorical and Limited Dependent Variables. Thousand Oaks, CA: Sage.
Required and especially useful for those who are interested in mathematical details. Recommended but not required.
Enrolling in Soc 650/Stat 503 and Time Conflicts
Often there are more students who want to take S650 than there are
seats in the class. First priority is given to graduate students for which this is required for their degree program. Otherwise, authorizations are given on a first-come-first-serve basis. If you are interested in
taking the class, contact the graduate secretary in sociology to get on the list. The graduate secretary
(email@example.com) will contact you regarding
authorization for the class. If you are given an authorization, you need to sign
up for the class during the normal enrollment period; if you do not, your authorization will be given to the next student on the wait
Time conflicts: If you have another class that
overlaps with the lecture time, you will need to take the class another semester. If you have a time conflict with all of the lab
times, you should take the class some other semester. If you can attend some of the
labs each week and you are already familiar with Stata (or can learn it on your
own), you will probably do fine, but might have to work harder. While most of the lab time is used for students doing
independent work, the teaching assistant provide short lectures related
to the assignments. For example, he/she might provide additional information
about keeping a research log or how to format tables using Word.
Getting ready for Soc650/Stat503
There are several things you can do to get ready for the class.
- Review a book on the linear regression model.
- If you are rusty on mathematics, you can review the materials in this file.
- Feel free to start reading the books listed above.