Applied Statistics topics: A wishlist


This entry is to facilitate discussion on what we might want to see in, say, a year-long sequence of Applied Statistics coursework. I am thinking of either a MS degree program, where this sequence would be All of Applied Statistics, or part of a PhD program, where these courses would qualify as much of Applied Statistics.

Clearly, some core topics, like linear regression, needs to be covered. Also, since applied Statistics these days requires using computational techniques in a major way, we also need to address issues about how to use at least one standard statistical software, and how to interpret the results. Just as importantly, a good applied statistician should be able to tell, at least in some cases, when to not trust the output of a software, or do something beyond what a software routinely produces.

A second issue is: What exactly is Applied Statistics? In many places all over the world, ``applied Statistics'' meant some variety of linear regression, design of experiments, survey sampling, with perhaps some multivariate Normal distribution-based methodology. These ``classical'' topics have grown in strength and scope. For example, a full ``wishlist'' on regression now ought to include linear, nonlinear, generalized linear model-based regression, quantile and nonparametric regression, high dimensional parameters, diagnostics, vizualization, model selection, classical frequentist, Bayesian, resampling-based, empirical likelihood-based and other modern approaches to estimation, inference and prediction, and perhaps much more.

Let's start by making a list of topics we would like to see in a MS-level, or first-year PhD level applied stat course.

Here is my list:

  1. A module on survey sampling: Survey sampling is supremely important to be left as a completely optional topic. Students should have some idea of sampling techniques, recognize issues with nonresponse and biased sampling, have some idea of estimation and inference in case the data is not collected with a simple random sampling scheme.

  2. Multivariate applications: Again, topics like principal components, dimension reduction, missing data and imputation, are core to modern statistical applications, in essentially any discipline. A small module on the fundamentals of multivariate analysis ought to be part of any applied statistics program.

  3. Bayesian data analysis: Whether it is linear regression, survey sampling, experimental design, or anything under the sun, Bayesian approaches differ from classical frequentist ones in various and (sometimes major) ways, and our students ought to have some exposure to how Bayesians analyze data. At least, a small component of it.

  4. Resampling-based inference: Resampling is a phenomenally powerful technique for statistical inference. It often requires fewer and less restrictive assumptions than classical inference, is applicable on a wider collection of problems, is better than CLT-driven inference, and is embarrassingly parallel computationally. Students should learn it, it saves many headaches if you are worried whether your data adheres to textbook assumptions, and saves you from embarrassment if it doesn't.

  5. Modern issues with linear regression: Issues like model selection, high dimensional parameters, heteroscedasticity arise in many modern applications. We also have algorithms to handle many of these issues, we ought to expose the students to these fresh ideas.

  6. Modern applications: Students should have some exposure to modern applications of statistics, like bioinformatics, financial applications, network data analysis, image data analysis, and so on. This is at the outer fringes of a ``wishlist'': shoudl these be ``core'' topics?

What would you like to see as part of the ``applied core''? Put in your suggestions below.


More info on multivariate response problems
Intro to Time-Series models in 8051

1. Bayesian stuff. 2. Nonlinear regression

1. Bayesian stuff. 2. Nonlinear regression

I found elements of statistical learning is extremely useful least for me...

And the exercises for each chapters are awesome because most of them coming directly from recent papers...

So I hope for applied statistics, we can use that book to be one of reference textbooks.

State space, hidden Markov models, dynamic linear models, kalman filtering, those are really interesting topics in engineering..

So talking about applied statistics, it is better if one can integrate statistics with a concrete background context. So It is not such bad idea to consider topics like statistics in engineering, statistics in finance, statistics in climate, statistics in health care...

A short introduction to time series

I agree with Sen. The elements of statistical learning, or broader, machine learning is highly pursued here. I would suggest having a course like this and possibly assign some projects.

There are two basic, extreme positions on teaching applied statistics topics as well as many intermediate positions. The extreme positions are narrow/deep and wide/shallow. The U of M has traditionally followed the narrow/deep prescription. That is, each of our courses tends to cover a fairly narrow topic, but cover it in great depth. We see this in the applied sequence where we get two "classical" topics in great depth (regression and design), only a little coverage of a few other subjects, and vast areas with no coverage at all.

In contrast, as a graduate student I took a semester course called "Data Analysis" that spent at most a week or two on each of 10 to 12 topics. We hit regression, anova, multivariate, graphics, and others I can't recall off the top of my head (although I may have 35+ year old notes in a box somewhere!). This is the prototypical wide/shallow course. It was a great course for introducing me to applied statistics, but I sometimes only knew enough to be dangerous.

I think that last phrase is key. Our culture has been that we don't want to produce students who only know enough about something to be dangerous. We have either taught them a lot, or not taught the topic at all.

I think that we need to address the basic cultural issue of how shallow we are willing to go before going into great detail on which topics should be included.


Discuss the "art" of applied statistics, don't just provide the tools. What does doing an analysis actually look like? I remember after I took 8051 I realized I had no idea how to actually analyze a data set. I knew how to use all the tools, but not how to put them together.

Use of plots and graphs to explore data, to look for patterns, to make inferences, and to decide what to do next.

Include resampling-based inference (as Ansu suggests).

Consider revamping not only the material but the teaching methods. I would love to see these classes become less lecture-based and include more active learning. The six GAISE recommendations for introductory courses should perhaps all be considered for how they could improve our graduate curriculum.

To add to Ansu's list of important "classical topics", please add categorical data and power calculations...

I agree with Aaron that our goal should be to teach both, the "art" as well as the science (methods) of applied statistics. This is one of the reasons I like Gary's Exp Design text book so much -- a lot of problems have real-life messy data.

The problem with "less lecture based and ... more active learning" is that it takes more time. A lot more time. So, we would need to cut material if we divert lecture time for that. My own solution over the years has been to put "active learning" activities into the labs and homeworks. Not ideal, but there is only so much time. At the end of the day, we are teaching graduate students.

It is OK to have one broad/shallow course where a lot of topics are covered, but I would prefer the majority to be narrow/deep.

Report and proposal writing should be emphasized throughout. Reprodicibility should be also be emphasized--knitr could be covered early in the first applied course. Oral communications skills are also important.

There should be a lot of active learning in the applied courses, including working in teams on projects.

At least some Bayesian methods should be incorporated.

We should have some material on dependent data. For example, spatial statistics are completely ignored in our current curriculum.

Machine learning concepts and methods should be incorporated along the way, but I don't really see a need for a separate core course on them.

I am not certain where you are getting your info, however great topic. I needs to spend some time finding out much more or understanding more. Thanks for wonderful info I was looking for this info for my mission.

Leave a comment

About this Entry

This page contains a single entry by published on September 30, 2013 1:21 PM.

Stat applied courses: what's good is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.