Stat503 / Soc650 is a second course in in applied
statistics that assumes you have had a course in regression models for continuous dependent
variables, such as Soc 554 or Stat 501. Categorical Data Analysis deals with
regression models in which the dependent variable is categorical: binary, nominal, ordinal, and count.
Models that are discussed include probit and logit for binary outcomes, ordered logit and ordered probit for ordinal outcomes, multinomial logit for nominal outcomes, and Poisson regression and related models for count outcomes.
This web page serves as the syllabus for the class.
Assignment due dates · Workflow requirements · Grades and getting an A+ · Books
Computing and datasets · Getting Help · Enrolling, getting ready & time conflicts
News and policies
- Logistics: Class meets 1:00 - 2:15 TR in BH242. Please arrive on time and be ready to start at 1:00.
- Teaching Assistants: The AIs are Tom VanHeuvelen (tvanheuv at indiana dot edu) and Trent Mize (tdmize at indiana dot edu).
- Computing labs: You have signed up for one of two labs sections held in TuTh 2:30PM - 4:30PM Student Building SB 230 and TuTh 5:30PM - 7:30PM Student Building SB 230. Each section meets twice a week for two hours. The lab instructors might provide a short presentation or discuss the assignments at the start of each lab. Teaching assistants will be available for 90 minutes each day, but might not be available the last 30 minutes of lab.
- Office hours: Office hours are Tu 2:30-4:00 and Th 9:00-10:30 in Ballantine 842B Enter 842 (no need to knock) and my office is at the end of the hall. If I'm talking with someone, please let me know you are waiting. Feel free to contact me by e-mail; during the week if you don't hear within 12 hours, try again.
- Course materials: Course materials including PDFs of the lecture notes are located at on the class LAN. To connect, you must be on campus or connect via VPN. For Win, "\\bl-soc-theseus.ads.iu.edu\s503$" (some have found that they need to use "\\bl-soc-theseus\s503$") and for Mac
"smb://bl-soc-theseus.ads.iu.edu/s503$" . Let your TA know if you are having problems connecting.
- Turning in assignments: Assignments are due 30 minutes before the end of your lab on the due date. Pedagogically, it is critical to complete assignments on time. The penalty for late assignments is 25%. Exceptions are made for special circumstances. You must get approval for a later assignment by e-mail (often confirming a conversation) from Professor Long. Turn in a copy of the e-mail along with your late assignment.
- All of your work must be placed in your folder on the class LAN before assignments are turned in.
- IU students can run Stata for free over the internet. For details go here. Opinions vary on how well this works.
Due dates - assignments are due 30 minutes before the lab you are enrolled in ends.
Assignments in the form of Word files will be available at the FTP site. Add your answers to the assignment file and place the renamed, completed file in your LAN directory. Details are provided in lab.
- Assignment 1: Math review. Due tentatively day 3, Sep 2.
- Assignment 2: Data cleaning. Due tentatively day 6, Sep 11.
- Assignment 3: Picking your variables. Due tentatively day 7, Sep 16.
- Assignment 4: LRM. Due tentatively day 9, Sep 23.
- Assignment 5: BRM-1.
Due tentatively day 12, Oct 2.
- Assignment 6: BRM-2. Due tentatively day 16, Oct. 16.
- Assignment 7: Testing and Fit. Due tentatively day 19, Oct 28.
- Assignment 8: MNLM-1. Due tentatively day 21, Nov 4.
- Assignment 9: MNLM-2. Due tentatively day 24, Nov 13.
- Assignment 10: ORM. Due tentatively day 28, Dec 4.
- Assignment 11: Count models. Due Tuesday of exam week at 5PM in Scott Long's BH 744 mailbox.
Workflow requirements for assignments
An essential part of being an effective researcher is a workflow that allows you to organize your efforts and replicate your findings. Since this class is an applied course in data analysis, a portion of your grade is based on the workflow you use in completing your assignments. More general information and a detailed treatment of workflow is available at Long’s workflow page and his book The Workflow of Data Analysis Using Stata. For this class you are not required to implement the full workflow from the book, but you are encouraged to improve your workflow as your time and interest allow. In lab, the AI's will present requirements for the workflow used in your work.
Grades and getting an A+
- Overview: Grades are based on assignments, your research diary, and attendance. Each of these assignments is given a number of points with a total of about 900 points. The grade is based on your percent of the total points using A=100%-94%; A-=93%-91%; B+=90-88%=B+; B=87%-84%=B; B-=83%-81%; etc.
- Mistakes and inconsistencies: If a mistake is made in grading, I apologize. Return the assignment to me along with a cover page explaining the error. If I do not return the assignment documenting the change within two class periods, please remind me by e-mail. Multiple people are doing the grading and we try to be consistent, but the is bound to be some variation (as in submitting a paper to a journal for review!).
- Getting an A+: To get an A+ you must do a project as well as receive an A on the assignments. You must get Professor Long's approval for your project and meet with him periodically. The final project is due the first day of finals. A careful write-up of your results along with the supporting Stata files and research log are required.
Computing and datasets
- Getting Started with Stata and CDA: Lab Guide for Stata are available on the LAN. I recommend that you work through the section of the guide corresponding to the current assignment before you start the assignment, even if you are sure you don't need to!
- Datasets: Datasets for the the class are located on the LAN and can also be accessed with the SPost spex command. Codebooks are in the Lab Guide. These datasets are not fully "cleaned" and sometimes the codebook doesn't clearly describe what a variable is, just like real-world data.
- Stata: While you may freely use my ado files, they require Stata to run. Stata is installed in campus computing labs. Personal copies can be purchased from the IU Stat/Math Center.
If you need help debugging a program, the best
thing is to place relevant files in your directory on the LAN in a subdirectory called \helpme (e.g., \jslong\helpme). Include the do-file, the dataset, and log file in
text format, not smcl. Follow the guidelines below; for further details on getting help, check here.
1) The do-file must be self-contained. It must load the data, create needed variables (if any), generate the
problem, and save a log file in text format. The do-file must have comments explaining what you are
doing and what the problem is.
2) If a SPost command is causing a problem,
include the command which command-name for the command causing the problem. This tells me which
version of the command you are using.
3) Do not refer to specific directories (e.g., do not: use d:\mydata\science3.dta).
Assume that your data is located in your Stata working directory.
Here is an example of what the do-file might look:
capture log close
log using jslong_assgn1_problem, text replace
// Scott Long - 2011-08-31
// Assignment 4: binary regression
// ERROR: see #3 below.
// #1: load data and check data
spex science2, clear
sum x1 x2
// #2: estimate logit
logit y xl x2, nolog
// #3: compute discrete change
// ERROR: variable xl not found
mchange, at(x1=1 x2=3)
- The lecture notes are available as PDF files on the ftp site.
- Long, J. Scott and Freese, Jeremy. 2014.
Regression Models for Categorical Dependent Variables Using Stata, 3rd
Edition. Stata Press: College Stata, TX is required. (Try code ASA14 if you buy it from the StataCorp site).
- Long, J.S. 2008, The Workflow of Data Analysis Using Stata. Stata Press: College Station, TX. If you plan to do a lot of data analysis, this book will save you a lot of time and make your work replicable. Recommended but not required.
- Long, J. Scott. 1997. Regression Models for
Categorical and Limited Dependent Variables. Thousand Oaks, CA: Sage.
Required and especially useful for those who are interested in mathematical details. Recommended but not required.
Enrolling in Soc 650/Stat 503 and Time Conflicts
Often there are more students who want to take S650 than there are
seats in the class. First priority is given to graduate students for which this is required for their degree program. Otherwise, authorizations are given on a first-come-first-serve basis. If you are interested in
taking the class, contact the graduate secretary in sociology to get on the list. The graduate secretary
(firstname.lastname@example.org) will contact you regarding
authorization for the class. If you are given an authorization, you need to sign
up for the class during the normal enrollment period; if you do not, your authorization will be given to the next student on the wait
Time conflicts: If you have another class that
overlaps with the lecture time, you will need to take the class another semester. If you have a time conflict with all of the lab
times, you should take the class some other semester. If you can attend some of the
labs each week and you are already familiar with Stata (or can learn it on your
own), you will probably do fine, but might have to work harder. While most of the lab time is used for students doing
independent work, the teaching assistant provide short lectures related
to the assignments. For example, he/she might provide additional information
about keeping a research log or how to format tables using Word.
Getting ready for Soc650/Stat503
There are several things you can do to get ready for the class.
- Review a book on the linear regression model.
- If you are rusty on mathematics, you can review the materials in this file.
- Feel free to start reading the books listed above.