P533/P534 Intro. to Bayesian Data Analysis I & II, Prof. Kruschke
P533/P534
Introduction to Bayesian Data Analysis I & II.
P533 Fall 2009:
Tu, Th, 2:30-3:45pm, room Psych 115.
(Registrar class number 10164)
P534 Spring 2010:
Will be offered, time and place TBA.
P533/P534 is a tutorial introduction to doing Bayesian statistics
for data analysis. In P533, we start from the basics of probabilities
and Bayes' theorem, and gradually work our way through contemporary
Monte Carlo methods in the context of simple analyses, building up to
simple examples of hierarchical models (see list of topics below). In
P534, we do a variety of realistic applications, covering the Bayesian
versions of linear regression, logistic regression, t-tests, analysis
of variance, etc., including repeated measures designs. More details
about topic coverage is provided below. The course is intended to make
advanced Bayesian methods genuinely accessible to real graduate
students, and even unreal undergraduates (see pre-req's below). The
course is "hands on": We will build many computer-based analyses so
that you can actually get in the kitchen and make a meal, rather than
just consume fast food at the drive through. This way you can adapt
the methods to your own research scenarios.
Why should we do Bayesian analysis instead of
20th century null hypothesis significance testing? Read THIS.
How does this course (P533/P534) differ from S626? The
Dept. of Statistics offers S626, Bayesian theory and data analysis. Fall 2009
is the first time it will be taught. S626 has a prerequisite of "two
statistics courses at the graduate level". Students are encouraged
to consider S626 after taking P533/P534.
Topics covered, in a little more detail: P533 is the first
semester of a two-semester sequence. This first semester emphasizes
the simplest data situation: two-valued measurements such as yes/no,
agree/disagree, remember/forget, detect/miss, male/female,
heads/tails, and so on. The main goal is to use this simple situation
to develop all the methods of contemporary Bayesian analysis,
including hierarchical models and even the
impress-your-friends-with-this "transdimensional Markov chain Monte
Carlo" method for model comparison! The second semester, P534, applies the methods to more complex data
designs, corresponding to classical methods of linear regression,
logistic regression, t-tests, analysis of variance, etc.
P533 Topics:
- Models, parameters, beliefs. Intro to the R programming language.
- Probability: Inside and outside the head. Mass and
density. Conditional probabilities.
- Bayes' theorem. Three goals of statistical inference.
- Inferring a binomial proportion via exact mathematical analysis (easy, honest!).
- Inferring a binomial proportion via grid approximation.
- Inferring a binomial proportion via Monte Carlo
approximation. The Metropolis algorithm.
- Inferences regarding two binomial proportions. This
motivates our first look at Gibbs sampling.
- Binomial likelihood with hierarchical priors. Intro to
"BUGS" software.
- Hierarchical modeling and model comparison.
- Goals, power, and sample size, i.e., research design from a
Bayesian perspective.
- Comparison of Bayesian inference with null hypothesis
significance testing.
P534 Topics:
- Generalized Linear Model
- Estimating a mean and variance (like a classical t-test)
- Simple linear regression
- Oneway analysis of variance
- Multiple predictors and their interaction (in regression and ANOVA)
- Logistic regression (dichotomous dependent variable)
- Ordinal dependent-variable regression
- Two metric dependent variables: Correlation
- Two or more categorical dependent variables: Contingency table analysis.
Prerequisites:
This is not a mathematical
statistics course, but a fair amount of mathematics is unavoidable. If
you know what someone means when she says, "The integral of x squared
is one-third x cubed," then you should be okay. You will not have to
generate a lot of mathematical derivations, but you will have
to understand some, and these will involve nothing more than
basic summation notation (e.g., Σi xi) and
first-year calculus.
We will be doing a lot of computer programming in a language
called R. R is free and can be installed on any computer, but we will
be using an add-on package called BRugs that only works with Windows
(you can run it on a dual-processor Macintosh, too). The road to
understanding will be much smoother if you have already had some
programming experience, in any language. It's easy to learn basic
programming, but it can be time consuming, so if you don't have any
previous experience, just anticipate spending more time. Learning to
program can have huge payoffs in multiple situations later in your
career, so it's worth the effort.
A previous course in traditional statistics (such as K300) or
probability can be helpful as background. Although this course
proceeds completely independently of traditional ("null hypothesis
significance testing") statistical methods, you might find the
concepts of probability easier to understand if you have already had
some exposure to them.
Materials:
The primary materials will be info delivered
at lecture and in extensive course notes being written by
Prof. Kruschke. Readings will be posted on Oncourse under the
"Resources" link.
Please also discuss the assignments and lectures on Oncourse under the
"Forums" link. If you are attending the class but cannot get access
to the Oncourse page, please e-mail Prof. Kruschke.
The following textbooks are recommended (not required) other sources:
Albert, J. H., & Rossman, A. J. (2001). Workshop Statistics:
Discovery with Data, a Bayesian Approach. Emeryville, CA: Key College
Publishing. The last few chapters of this book give a wonderfully
"hands on" introduction to the basics of Bayesian statistics --- but
only the basics.
Bolstad, W. M. (2007). Introduction to Bayesian
Statistics, 2nd Ed. Hoboken, NJ: Wiley. Terrific tutorial that uses only
basic calculus; highly recommended. Only down side is that it does not
cover hierarchical models or computer implementation of any numerical
approximation methods.
Lynch, S. M. (2007). Introduction to applied Bayesian statistics and
estimation for social scientists. New York: Springer. Uses lots of
large-scale real-world examples, but all the software is "home grown"
instead of using BUGS.
Gelman, A., & Hill, J. (2007). Data analysis using regression and
multilevel/hierarchical models. New York: Cambridge University
Press. Has a lot about traditional (NHST) methods, with Bayesian
approaches sprinkled in. Good resource!
Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2004):
Bayesian Data Analysis, 2nd Ed. Boca Raton, FL: Chapman and Hall/CRC
Press. Offers more advanced examples of Bayesian methods, but requires
more "connecting the dots" by the beginner.
Grading; Homework; Exams: There are homework exercises
assigned every week or two. No exams or projects. Grades will be
determined by performance on the homework assignments. All assignments
are mandatory. There will be penalties for late homework unless you
have a cogent excuse. These penalties are designed as an incentive to
you because the material is cumulative; the penalties also help keep
things fair to all students. If you must be late with an assignment,
please notify the professor immediately.
Disclaimer: All the information here is subject to
change. Changes will announced in class.
This web page is at URL = http://www.indiana.edu/~jkkteach/P533/