This entry is to facilitate discussion on what we might want to see in, say, a year-long sequence of Applied Statistics coursework. I am thinking of either a MS degree program, where this sequence would be All of Applied Statistics, or part of a PhD program, where these courses would qualify as much of Applied Statistics.
Clearly, some core topics, like linear regression, needs to be covered. Also, since applied Statistics these days requires using computational techniques in a major way, we also need to address issues about how to use at least one standard statistical software, and how to interpret the results. Just as importantly, a good applied statistician should be able to tell, at least in some cases, when to not trust the output of a software, or do something beyond what a software routinely produces.
A second issue is: What exactly is Applied Statistics? In many places all over the world, ``applied Statistics'' meant some variety of linear regression, design of experiments, survey sampling, with perhaps some multivariate Normal distribution-based methodology. These ``classical'' topics have grown in strength and scope. For example, a full ``wishlist'' on regression now ought to include linear, nonlinear, generalized linear model-based regression, quantile and nonparametric regression, high dimensional parameters, diagnostics, vizualization, model selection, classical frequentist, Bayesian, resampling-based, empirical likelihood-based and other modern approaches to estimation, inference and prediction, and perhaps much more.
Let's start by making a list of topics we would like to see in a MS-level, or first-year PhD level applied stat course.
Here is my list:
- A module on survey sampling: Survey sampling is supremely important to be left as a completely optional topic. Students should have some idea of sampling techniques, recognize issues with nonresponse and biased sampling, have some idea of estimation and inference in case the data is not collected with a simple random sampling scheme.
- Multivariate applications: Again, topics like principal components, dimension reduction, missing data and imputation, are core to modern statistical applications, in essentially any discipline. A small module on the fundamentals of multivariate analysis ought to be part of any applied statistics program.
- Bayesian data analysis: Whether it is linear regression, survey sampling, experimental design, or anything under the sun, Bayesian approaches differ from classical frequentist ones in various and (sometimes major) ways, and our students ought to have some exposure to how Bayesians analyze data. At least, a small component of it.
- Resampling-based inference: Resampling is a phenomenally powerful technique for statistical inference. It often requires fewer and less restrictive assumptions than classical inference, is applicable on a wider collection of problems, is better than CLT-driven inference, and is embarrassingly parallel computationally. Students should learn it, it saves many headaches if you are worried whether your data adheres to textbook assumptions, and saves you from embarrassment if it doesn't.
- Modern issues with linear regression: Issues like model selection, high dimensional parameters, heteroscedasticity arise in many modern applications. We also have algorithms to handle many of these issues, we ought to expose the students to these fresh ideas.
- Modern applications: Students should have some exposure to modern applications of statistics, like bioinformatics, financial applications, network data analysis, image data analysis, and so on. This is at the outer fringes of a ``wishlist'': shoudl these be ``core'' topics?
What would you like to see as part of the ``applied core''? Put in your suggestions below.
Recent Comments