P533 Bayesian Data Analysis, Prof. John K. Kruschke

Spring 2018: Tu,Th 9:30am-10:45am, Room 111 Psych.

Success increasing with knowledge of Bayesian data analysisOverview: P533 is a tutorial introduction to doing Bayesian data analysis. The course is intended to make advanced Bayesian methods genuinely accessible to graduate students in the social sciences. The course covers all the fundamental concepts of Bayesian methods, and works from the simplest models up through hierarchical models (a.k.a. multilevel models) applied to various types of data. More details about content are provided below in the Schedule of Topics. Students from all fields are welcome and encouraged to enroll (see figure at right). The course uses examples from a variety of disciplines.

Prerequisites: This is not a mathematical statistics course, but some math is unavoidable. If you understand basic summation notation like Σi xi  and integral notation like ∫ x dx , then you're in good shape. We will be doing a lot of computer programming in a language called R. R is free and can be installed on any computer. The textbook includes an introductory chapter on R. A previous course in traditional statistics or probability can be helpful as background, but is not essential. P533 proceeds independently of traditional ("null hypothesis significance testing") statistical methods.

Credit toward I.U. Statistics Department requirements: P533 counts toward the Ph.D. minor in STAT and toward the 12 hour "area relevant to statistics" section of the MSAS (Masters in Applied Statistics).

Homework: There will be weekly homework assignments. You are encouraged to use whatever resources help you understand the homework and complete it with full comprehension, but ultimately you must write your own answers on your own and in your own words. Each homework assignment begins with an honor statement indicating that you are writing your answers on your own in your own words. In your answers that you submit, please provide explanations and thoroughly show all your computations, with annotation that explains what you are doing. An unannotated succession of computations will not get full credit, even if it is numerically correct.

Course Grading Method: Grading is based on your total homework score, as a percentile relative to the class. There are no exams and no projects. N.B.: Scores tend to be very high, so do not think that, say, 96% must be a grade of A because it could end up being an A- if, say, two thirds of the class does better than 96%. Typically the late penalties turn out to be a bigger deduction than points missed due to errors, so don't fall behind. As this is a graduate course, grades are typically in the A to high B range, and only rarely is a C or less assigned.

All assignments are mandatory. Late homework is exponentially penalized with a half-life of one week, meaning that after one week 50% is the maximum possible score. (The R program for the exponential decay is in the Canvas files; see LatePenalty.R.) No homework may be turned in more than three weeks later than its due date (and no homework may be turned in after 12:00 noon of Wednesday of finals week). There are two reasons for this policy: First, the course moves quickly and the material is largely cumulative, so the late penalty acts as an extra incentive to keep up. Second, the assistant, who will be grading the homework, must not be given a flood of late homework papers at the end of the semester. In recognition of the fact that "life happens" (e.g., short-term illness, personal turmoil, overwhelming confluence of deadlines, etc.), your two worst late penalties will be dropped. In other words, for every homework we will record the scores with and without a late penalty. The two homeworks with the largest difference between with- and without- late penalty will have their late penalty dropped. Note, therefore, that any homework not turned in will count as zero.

Required textbook: Doing Bayesian Data Analysis, 2nd Edition: A Tutorial with R, JAGS, and Stan. Go to the web page, https://sites.google.com/site/doingbayesiandataanalysis/purchase, for a link to purchase the book. The book is also available online through the IU Library.

Instructor: John K. Kruschke, johnkruschke@gmail.com. Office hours by appointment; please do ask.

 

Assistant: Brad Celestin, bcelesti@umail.iu.edu. Office hours to be posted on Canvas.

Discussion: Please discuss the assignments and lectures on Canvas. If you are attending the class but cannot get access to the Canvas page, please email Prof. Kruschke.

Disclaimer: All information in this document is subject to change. Changes will be announced in class.

 

Schedule of Topics

Exact day of each topic might flex as course progresses.

Week

Day

Chapter and topic

1

Tu

2. Introduction: Credibility, models, and parameters.

Strongly recommended article: “Bayesian data analysis for newcomers” at https://link.springer.com/article/10.3758/s13423-017-1272-1 or https://psyarxiv.com/nqfr5/

1

Th

3. The R programming language. Instructions for installation of software are here: https://sites.google.com/site/doingbayesiandataanalysis/software-installation

2

Tu

4. Probability.

2

Th

5. Bayes’ rule.

3

Tu

6. Inferring a probability via mathematical analysis.

3

Th

7. Markov chain Monte Carlo (MCMC).

4

Tu

8. JAGS.

4

Th

8, continued.

5

Tu

9. Hierarchical models.

5

Th

9, continued.

10. Model comparison.

6

Tu

10, continued.

11. Null hypothesis significance testing (NHST).

Strongly recommended article: “The Bayesian New Statistics” at https://link.springer.com/article/10.3758/s13423-016-1221-4 or https://osf.io/ksfyr/

6

Th

11. NHST, continued.

7

Tu

12. Bayesian null assessment.

See also article titled “Bayesian assessment of null values via parameter estimation and model comparison” at http://www.indiana.edu/~kruschke/articles/Kruschke2011PoPScorrected.pdf

7

Th

12, continued.

8

Tu

13. Goals, power, and sample size. See also video at http://www.youtube.com/playlist?list=PL_mlm7M63Y7j641Y7QJG3TfSxeZMGOsQ4.

8

Th

13, continued.

9

Tu

15. The generalized linear model.

16. Metric predicted variable, 1 or 2 group predictor variable.

9

Th

16, continued.

Also power analysis applied to 2 groups. See article titled “Bayesian estimation supersedes the t test” at http://www.indiana.edu/~kruschke/BEST/.

10

Tu

17. Metric predicted variable, metric predictor variable.

10

Th

17, continued.

18. Metric predicted variable, metric predictor variables.

See also article titled “The time has come: Bayesian methods for data analysis in the organizational sciences” at http://www.indiana.edu/~kruschke/BMLR/.

11

Tu

18, continued.

11

Th

19. Metric predicted variable, nominal predictor variable.

12

Tu

19, continued.

20. Metric predicted variable, nominal predictor variables.

12

Th

20, continued.

13

Tu

21. Dichotomous predicted variable (logistic regression).

13

Th

22. Nominal predicted variable (softmax regression).

For an applied example of hierarchical conditional logistic regression, see article titled “Ostracism and fines in a public goods game with accidental contributions: The importance of punishment type” at http://journal.sjdm.org/14/14721a/jdm14721a.pdf

14

Tu

22, continued.

14

Th

23. Ordinal predicted variable (ordinal probit regression).

For another example or ordinal regression, see manuscript titled “Moral Foundation Sensitivity and Perceived Humor” at http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2519218

For more about the perils of applying metric models to ordinal data, see this manuscript: https://osf.io/9h3et/ and this blog post: http://doingbayesiandataanalysis.blogspot.com/2017/12/which-movie-is-rated-better-dont-treat.html

15

Tu

23, continued.

15

Th

24. Count predicted variable.

Finals

 

No final exam, but final homework is due during finals’ week at date TBA.