So two questions come up.

- Does the page count vary by season?
- Is the proposed relationship between the NASDAQ and the magazine's page count affected by seasonality?

First variation by season. December does have an empirically higher mean page count than all other months. Taking a sample of the last 10-11 years as representative of all years, several months are significantly different from December. The plot below shows how much higher December is than each month on average with 95% Tukey confidence intervals.

I think the second question is more interesting, but the resulting analysis is one step removed from the data. It examines the residuals from a linear regression of the NASDAQ on the page count. See the scatter plot with regression line below.

I've indicated which data points are from November and December. 8/10 Novembers and 9/11 Decembers have page counts that would suggest higher NASDAQ's than actually occurred. That is, they have negative residuals.

Do November and December have residuals that are significantly lower than zero? If we compare the mean residual to zero for each month, November's mean is significantly different from zero (p=.046) and December's is nearly significantly different (p=.052). Each month's mean with corresponding 95% confidence interval is shown below.

Looking back at the scatter plot, my conclusion is that high page counts (> 250) are all either attributable to a surging tech market or the approach of christmas (Nov/Dec).

It was an easy conclusion to fit a model with the same fixed effects structure but with four different variance parameters, one for each level of the time factor. However, finding out how to make SAS fit that model is a separate issue entirely.

I was pretty sure that PROC MIXED had the answer, but wasn't sure exactly what REPEATED or RANDOM statement would be required. It turns out that the secret line is "REPEATED / GROUP=TIME" does the trick. Notice that there's no factor that's actually repeated, the group option being the critical piece.

Structurally this is the same as doing a stratified analysis where each time's data is analyzed separately if there's an interaction of TIME with the E2 and NAL effects. The reason for trying so hard to get it all in one model is to be able to make all pairwise comparisons (including across times) using Tukey's method.

I have an additional question about this analysis. While there's no overall R-squared to report from the mixed model, it still seems to make sense that each time's R-squared could be calculated (since the four variance parameters have the same estimates as a stratified analysis) and appropriately reported to give an idea of how good the model fit is. Any arguments or alternatives to this approach?

]]>It's got me thinking about what that app should look like. He mentioned Excel as probably being the most universal app where everyone would feel at home with the interface. However the problem for data analysis is that Excel doesn't do it.

That leads me to think of SPSS (which he mentioned, but without much detail) and Minitab. These both have a bit of an Excel feel, in that you can see the data in a nice spreadsheet at all times. Their menu systems keep infrequent users from having to recall the names of functions and analysis routines. Maybe some sort of a decision tree to help direct the analysis would be useful. What features are these programs lacking to fill that niche better?

For one, they should have interactive graphics. They need better extensibility. What else?

]]>