Statistics | Exploratory Data Analysis
S470 | 27369 | Karen Kafadar


How do you analyze data?  When faced with data from various sources,
of various types, what questions should one ask, and what clues can we
find in the data to further our understanding?

Statistics, broadly defined, is the science of and art of analyzing
data.  Many statistical procedures require formal probability model
structures with parameters, and statistical methods offer tools for
estimating those model parameters.  Sometimes the assumptions
governing those models hold, but often they do not.  What analyses can
provide insight into the data and the underlying mechanisms while
being insensitive to model assumptions?  Nonparametric methods are
distribution-free, but some prior analysis is needed to understand the
data.

Exploratory data analysis is a philosophy of analyzing data.  The
ubiquity of data and the emergence of "data mining" makes this course
essential for anyone who wants to analyze data.  In this course, we
will learn many different tools for data analysis as well as the
commands and programs in R (free statistical software) for conducting
these analyses. Some prior familiarity with statistical methods is
assumed. Those who have had formal statistics courses can take the
course at a higher level, where connections between EDA tools and
mathematical statistical methods will be developed.  This course is
valuable to anyone who has data to analyze.  It is also a lot of fun;
students learn a lot.

Pre-requisite:  At least one prior course in statistics is expected.

Course objectives: Introduce philosophy of exploratory data analysis;
Teach tools for the analysis of data
Provide opportunities for analyzing data (R/S-Plus); Demonstrate the
value of oral/written  communication skills; Offer experience in
preparing oral and written reports of data analyses

Time: Tuesday and Thursday, 11:15-12:30

Texts:
D.C. Hoaglin, F. Mosteller and J.W. Tukey,
Understanding Robust & Exploratory Data Analysis
F. Mosteller and J.W. Tukey,
Data Analysis and Regression: A Second Course in Statistics

Topics:

The philosophy of exploratory versus confirmatory data analysis
Summarizing batches of data: Stem-and-leaf diagrams, boxplots, qq plots
Data Transformations (ladder of re-expressions)
Jackknife and bootstrap
Two-way and three-way analyses (median polish)
Standardization
Fitting robust-resistant lines (least absolute deviations)
Analyzing count data

Instructor:
Professor Karen Kafadar, 812-856-7825, kkafadar@indiana.edu