S482 | 27061 | Guilherme Rocha

M-estimates are a broad class of statistical estimates obtained as the solution to an empirical optimization process. Typically, the population parameter is defined as the minimizer of a population risk function and its estimate is defined as the minimizer of the empirical risk. While M-estimates are known to enjoy many desirable properties, goodness of fit alone is not an adequate method for selecting the best among models of different "complexity." On the one hand, "simpler" models can be more revealing of the structure in the data. On the other hand, they are often restricted versions of more "complex" models and hence will never be preferred based on goodness of fit alone. In this course, we cover model selection techniques with an emphasis on variable selection in generalized linear models. We review classical variable selection methods such as AIC, BIC, and Mallows' Cp and discuss some of the computational issues involved. In addition, we introduce some alternative measures of the complexity of a model and review how they can be used for model selection purposes. Finally, we briefly review some of the issues specific to high-dimensional data sets and how they can be addressed.