P533/P534 Intro. to Bayesian Data Analysis I & II, Prof. Kruschke

P533/P534
Introduction to Bayesian Data Analysis I & II.

Prof. John K. Kruschke

P533 Fall 2009: Tu, Th, 2:30-3:45pm, room Psych 128 (not 115).
(Registrar class number 10164)

P534 Spring 2010: Will be offered, time and place TBA.

P533/P534 is a tutorial introduction to doing Bayesian statistics for data analysis. In P533, we start from the basics of probabilities and Bayes' theorem, and gradually work our way through contemporary Monte Carlo methods in the context of simple analyses, building up to simple examples of hierarchical models (see list of topics below). In P534, we do a variety of realistic applications, covering the Bayesian versions of linear regression, logistic regression, t-tests, analysis of variance, etc., including repeated measures designs. More details about topic coverage is provided below. The course is intended to make advanced Bayesian methods genuinely accessible to real graduate students, and even unreal undergraduates (see pre-req's below). The course is "hands on": We will build many computer-based analyses so that you can actually get in the kitchen and make a meal, rather than just consume fast food at the drive through. This way you can adapt the methods to your own research scenarios.

Why should we do Bayesian analysis instead of 20th century null hypothesis significance testing? Read THIS.

How does this course (P533/P534) differ from S626? The Dept. of Statistics offers S626, Bayesian theory and data analysis. Fall 2009 is the first time it will be taught. S626 has a prerequisite of "two statistics courses at the graduate level". Students are encouraged to consider S626 after taking P533/P534.

Topics covered, in a little more detail: P533 is the first semester of a two-semester sequence. This first semester emphasizes the simplest data situation: two-valued measurements such as yes/no, agree/disagree, remember/forget, detect/miss, male/female, heads/tails, and so on. The main goal is to use this simple situation to develop all the methods of contemporary Bayesian analysis, including hierarchical models and even the impress-your-friends-with-this "transdimensional Markov chain Monte Carlo" method for model comparison! The second semester, P534, applies the methods to more complex data designs, corresponding to classical methods of linear regression, logistic regression, t-tests, analysis of variance, etc.

P533 Topics:

  1. Models, parameters, beliefs. Intro to the R programming language.
  2. Probability: Inside and outside the head. Mass and density. Conditional probabilities.
  3. Bayes' theorem. Three goals of statistical inference.
  4. Inferring a binomial proportion via exact mathematical analysis (easy, honest!).
  5. Inferring a binomial proportion via grid approximation.
  6. Inferring a binomial proportion via Monte Carlo approximation. The Metropolis algorithm.
  7. Inferences regarding two binomial proportions. This motivates our first look at Gibbs sampling.
  8. Binomial likelihood with hierarchical priors. Intro to "BUGS" software.
  9. Hierarchical modeling and model comparison.
  10. Goals, power, and sample size, i.e., research design from a Bayesian perspective.
  11. Comparison of Bayesian inference with null hypothesis significance testing.

P534 Topics:

  1. Generalized Linear Model
  2. Estimating a mean and variance (like a classical t-test)
  3. Simple linear regression
  4. Oneway analysis of variance
  5. Multiple predictors and their interaction (in regression and ANOVA)
  6. Logistic regression (dichotomous dependent variable)
  7. Ordinal dependent-variable regression
  8. Two metric dependent variables: Correlation
  9. Two or more categorical dependent variables: Contingency table analysis.

Prerequisites:

  • This is not a mathematical statistics course, but a fair amount of mathematics is unavoidable. If you know what someone means when she says, "The integral of x squared is one-third x cubed," then you should be okay. You will not have to generate a lot of mathematical derivations, but you will have to understand some, and these will involve nothing more than basic summation notation (e.g., Σi xi) and first-year calculus.
  • We will be doing a lot of computer programming in a language called R. R is free and can be installed on any computer, but we will be using an add-on package called BRugs that only works with Windows (you can run it on a dual-processor Macintosh, too). The road to understanding will be much smoother if you have already had some programming experience, in any language. It's easy to learn basic programming, but it can be time consuming, so if you don't have any previous experience, just anticipate spending more time. Learning to program can have huge payoffs in multiple situations later in your career, so it's worth the effort.
  • A previous course in traditional statistics (such as K300) or probability can be helpful as background. Although this course proceeds completely independently of traditional ("null hypothesis significance testing") statistical methods, you might find the concepts of probability easier to understand if you have already had some exposure to them.

    Materials:

  • The primary materials will be info delivered at lecture and in extensive course notes being written by Prof. Kruschke. Readings will be posted on Oncourse under the "Resources" link.
  • Please also discuss the assignments and lectures on Oncourse under the "Forums" link. If you are attending the class but cannot get access to the Oncourse page, please e-mail Prof. Kruschke.

    The following textbooks are recommended (not required) other sources:

  • Albert, J. H., & Rossman, A. J. (2001). Workshop Statistics: Discovery with Data, a Bayesian Approach. Emeryville, CA: Key College Publishing. The last few chapters of this book give a wonderfully "hands on" introduction to the basics of Bayesian statistics --- but only the basics.
  • Bolstad, W. M. (2007). Introduction to Bayesian Statistics, 2nd Ed. Hoboken, NJ: Wiley. Terrific tutorial that uses only basic calculus; highly recommended. Only down side is that it does not cover hierarchical models or computer implementation of any numerical approximation methods.
  • Lynch, S. M. (2007). Introduction to applied Bayesian statistics and estimation for social scientists. New York: Springer. Uses lots of large-scale real-world examples, but all the software is "home grown" instead of using BUGS.
  • Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. New York: Cambridge University Press. Has a lot about traditional (NHST) methods, with Bayesian approaches sprinkled in. Good resource!
  • Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2004): Bayesian Data Analysis, 2nd Ed. Boca Raton, FL: Chapman and Hall/CRC Press. Offers more advanced examples of Bayesian methods, but requires more "connecting the dots" by the beginner.

    Grading; Homework; Exams: There are homework exercises assigned every week or two. No exams or projects. Grades will be determined by performance on the homework assignments. All assignments are mandatory. There will be penalties for late homework unless you have a cogent excuse. These penalties are designed as an incentive to you because the material is cumulative; the penalties also help keep things fair to all students. If you must be late with an assignment, please notify the professor immediately.

    Disclaimer: All the information here is subject to change. Changes will announced in class.

    This web page is at URL = http://www.indiana.edu/~jkkteach/P533/