J. Scott Long - Indiana University
Department of Sociology :: Department of Statistics :: Interuniversity Consortium for Political and Social Research
Bureau of Social Science Research :: Schuessler Institute for Social Research :: The Kinsey Institute
Home Teaching Research SPost Commands Workflow of Data Analysis Contact and vita Links Recommendations FTP downloads
ICPSR Summer Workshop on Categaorical Data Analysis

June 9-13, 2008

To enroll, contact the ICPSR Summer Program. FAQs · What to wear · Syllabus · Lab guide · Lab hints · Codebooks · Math Review · Datasets and do files · Using SPOST · FTP for ICPSR · esttab

This workshop examines the most important regression models for binary, ordinal, nominal and count outcomes. While advances in software have made it simple to estimate these models, interpreting the results of these models remains difficult due to the nonlinearities of the models. Learning how to interpret complex, nonlinear models is the primary objective of this class. The first two days are devoted to understanding fundamental issues of estimation, testing and assessing fit of nonlinear models. Basic concepts and notation are introduced by reviewing the linear regression model. Within this familiar context, the method of maximum likelihood estimation is presented. These ideas are then used to develop the logit and probit models for binary outcomes. A variety of practical methods for interpreting the nonlinear models are presented. Statistical testing and assessing fit is also illustrated with a series of real-world examples. The last three days focus on models for nominal, ordinal, and count outcomes. The ordinal model is presented as a series of binary models that are simultaneously estimated with constraints. The methods of testing and interpretation presented for the binary model are extended to ordinal models. Next, the multinomial logit model is presented. While conceptually this model is a simple extension of binary logit, the large number of comparisons involved make this model difficult to interpret. Graphical methods are introduced to address this difficulty, along with a series of particularly useful statistical tests. The last day deals with models for count outcomes, including Poisson regression, negative binomial regression, and zero modified models. Labs will show you how to apply each of the methods presented in lecture. While the labs use Stata, some time will be spent discussing how to apply these methods using other software. Knowledge of Stata is not assumed. This class covers a lot of material in a short time. At the minimum you should have a strong background in linear regression.

FAQs

  1. Can I bring my own data? Yes, and I will be glad to meet with you individually to discuss your research. However, it is really important that you bring data that is already in Stata format and that has been "cleaned". That is, time will be limited to convert your data to Stata or do initial explorations of your data during the workshops. Software will be available to convert data from one format to another. But, to be safe, try to do this before arriving.
  2. Do I have to use Stata? If you absolutely refuse to use Stata, you probably won't like the course. If you've never used Stata but are will to give it a try, you'll have no trouble doing the exercises. In lab you will be given handouts that walk you through each step and the TA and I will be there to help. Will you be able to apply what is done using other software. Absolutely, but it is likely to require a great deal more work. Why? Jeremy Freese and I have written the SPOST commands that make complex computations trivially simple. For those interested, I will demonstrate how to use spreadsheets I wrote with Simon Cheng to do this. I am also glad to talk with people about how you might approach these computations in other software.
  3. Do I need to know everything on the math review? It helps, but the most important thing is to be sure you are comfortable with the log transformation and the exponential. If you are confused by these, be sure to ask when you get here.
  4. What should I bring (to wear)?  A sweater or light jacket for class! While the thermometer in the class room is a continuous scale, it tends to produce a binary outcome of being either too hot or too cold. This might have changed, but to be safe, bring something extra to wear. (Those of you who have been reading ahead will notice that I've just given an example of the latent variable approach to deriving the binary logit and probit models.)
  5. And, what else should I bring?  I recommend that you have some sort of USB storage device to save your work. If you bring a laptop that has Stata installed, we can help you get all the special software installed that is used in class.
  6. Will I have fun? I certainly will and I think you will too. ICPSR one week workshops are a great way to learn new methods.
  7. Do I need to buy the books? You will be given extensive handouts with my lecture notes. While at ICPSR you might find these to be all you need. The "with Stata" book has less technical detail than the Sage book, but has a lot of information of using Stata. The books will be for sale in Ann Arbor; links to the books are here.

Datasets and sample do files

In Stata you should be able to get to the datasets over the web using spex dataset-name. The do files and data can be downloaded from Stata by type findit icpsrcda2008 and following what it tells you.

© 2007 J. Scott Long