Statistics | Statistical Model Selection
S482 | 27061 | Guilherme Rocha

M-estimates are a broad class of statistical estimates obtained as the
solution to an empirical optimization process. Typically, the
population parameter is defined as the minimizer of a population risk
function and its estimate is defined as the  minimizer of the
empirical risk. While M-estimates are known to enjoy many desirable
properties, goodness of fit alone is not an adequate method for
selecting the best among models of different "complexity." On the one
hand, "simpler" models can be more revealing of the structure in the
data. On the other hand, they are often restricted versions of more
"complex" models and hence will never be preferred based on goodness
of fit alone. In this course, we cover model selection techniques with
an emphasis on variable selection in generalized linear models. We
review classical variable selection methods such as AIC, BIC, and
Mallows' Cp and discuss some of the computational issues involved. In
addition, we introduce some alternative measures of the complexity of
a model and review how they can be used for model selection purposes.
Finally, we briefly review some of the issues specific to
high-dimensional data sets and how they can be addressed.