<?xml version="1.0" encoding="utf-8"?>
<feed version="0.3" xmlns="http://purl.org/atom/ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xml:lang="en">
<title>Tyson&apos;s Biostat Blog</title>
<link rel="alternate" type="text/html" href="http://blog.lib.umn.edu/roge0285/biostat/" />
<modified>2005-12-05T17:07:10Z</modified>
<tagline>Proving biostatistics is bloggable</tagline>
<id>tag:blog.lib.umn.edu,2009:/roge0285/biostat//926</id>
<generator url="http://www.movabletype.org/" version="4.25">Movable Type</generator>
<copyright>Copyright (c) 2005, roge0285</copyright>

<entry>
<title>Wired, NASDAQ, and the Christmas Question</title>
<link rel="alternate" type="text/html" href="http://blog.lib.umn.edu/roge0285/biostat/033511.html" />
<modified>2005-12-05T17:07:10Z</modified>
<issued>2005-12-05T15:53:33Z</issued>
<id>tag:blog.lib.umn.edu,2005:/roge0285/biostat//926.33511</id>
<created>2005-12-05T15:53:33Z</created>
<summary type="text/plain">At least two recent blog entries suggest a possible connection between the page count of Wired magazine and the NASDAQ stock index. One commenter suggested that this month&apos;s large page count was due to what he termed the &quot;&apos;Wired is...</summary>
<author>
<name>roge0285</name>


</author>

<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://blog.lib.umn.edu/roge0285/biostat/">
<![CDATA[<p>At least <a href="http://www.thepodcastnetwork.com/gadget/2005/12/03/wired-magazine-as-a-technology-market-barometer/">two</a> <a href="http://evans.blogware.com/blog/_archives/2005/11/29/1426823.html">recent</a> blog entries suggest a possible connection between the page count of <a href="http://www.wiredmag.com">Wired magazine</a> and the NASDAQ stock index. One commenter suggested that this month's large page count was due to what he termed the "'Wired is a christmas whore' phenomenon."</p>

<p>So two questions come up.<br />
<ul><br />
<li>Does the page count vary by season?</li><br />
<li>Is the proposed relationship between the NASDAQ and the magazine's page count affected by seasonality?</li><br />
</ul></p>

<p>First variation by season. December does have an empirically higher mean page count than all other months. Taking a sample of the last 10-11 years as representative of all years, several months are significantly different from December. The plot below shows how much higher December is than each month on average with 95% Tukey confidence intervals.<br />
<img alt="Wired-Magazine-Page-Count-Seasonality.jpg" src="http://blog.lib.umn.edu/roge0285/biostat/images/Wired-Magazine-Page-Count-Seasonality.jpg" width="448" height="336" /><br />
I think the second question is more interesting, but the resulting analysis is one step removed from the data. It examines the residuals from a linear regression of the NASDAQ on the page count. See the scatter plot with regression line below.<br />
<img alt="Wired-Page-Count-Predicting-NASDAQ-Scatter-Plot.jpg" src="http://blog.lib.umn.edu/roge0285/biostat/images/Wired-Page-Count-Predicting-NASDAQ-Scatter-Plot.jpg" width="448" height="336" /><br />
I've indicated which data points are from November and December. 8/10 Novembers and 9/11 Decembers have page counts that would suggest higher NASDAQ's than actually occurred. That is, they have negative residuals.</p>

<p>Do November and December have residuals that are significantly lower than zero? If we compare the mean residual to zero for each month, November's mean is significantly different from zero (p=.046) and December's is nearly significantly different (p=.052). Each month's mean with corresponding 95% confidence interval is shown below.<br />
<img alt="Wired-Page-Count-Predicting-NASDAQ-Seasonal-Bias.jpg" src="http://blog.lib.umn.edu/roge0285/biostat/images/Wired-Page-Count-Predicting-NASDAQ-Seasonal-Bias.jpg" width="448" height="336" /><br />
Looking back at the scatter plot, my conclusion is that high page counts (> 250) are all either attributable to a surging tech market or the approach of christmas (Nov/Dec).</p>]]>

</content>
</entry>

<entry>
<title>Variance Components</title>
<link rel="alternate" type="text/html" href="http://blog.lib.umn.edu/roge0285/biostat/010111.html" />
<modified>2005-11-28T18:52:43Z</modified>
<issued>2004-11-10T19:23:19Z</issued>
<id>tag:blog.lib.umn.edu,2004:/roge0285/biostat//926.10111</id>
<created>2004-11-10T19:23:19Z</created>
<summary type="text/plain">I was fitting a three-way ANOVA model (with two drugs, E2 and NAL, either present or absent and TIME having 4 levels) and found that the assumption of constant variance in each cell was questionable. Bartlett&apos;s test (performed with &quot;means...</summary>
<author>
<name>roge0285</name>


</author>
<dc:subject></dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://blog.lib.umn.edu/roge0285/biostat/">
<![CDATA[<p>I was fitting a three-way ANOVA model (with two drugs, E2 and NAL, either present or absent and TIME having 4 levels) and found that the assumption of constant variance in each cell was questionable. Bartlett's test (performed with "means factor / hovtest=Bartlett" where factor is a oneway ANOVA factor, in this case created to indicate each of 16 unique combinations of E2, NAL, and TIME) wasn't quite significant, but there was a definite fan shape in the residuals showing a sharp reduction in the variance as time increased.</p>

<p>It was an easy conclusion to fit a model with the same fixed effects structure but with four different variance parameters, one for each level of the time factor. However, finding out how to make SAS fit that model is a separate issue entirely.</p>

<p>I was pretty sure that PROC MIXED had the answer, but wasn't sure exactly what REPEATED or RANDOM statement would be required. It turns out that the secret line is "REPEATED / GROUP=TIME" does the trick. Notice that there's no factor that's actually repeated, the group option being the critical piece.</p>

<p>Structurally this is the same as doing a stratified analysis where each time's data is analyzed separately if there's an interaction of TIME with the E2 and NAL effects. The reason for trying so hard to get it all in one model is to be able to make all pairwise comparisons (including across times) using Tukey's method.</p>

<p>I have an additional question about this analysis. While there's no overall R-squared to report from the mixed model, it still seems to make sense that each time's R-squared could be calculated (since the four variance parameters have the same estimates as a stratified analysis) and appropriately reported to give an idea of how good the model fit is. Any arguments or alternatives to this approach?</p>]]>

</content>
</entry>

<entry>
<title>Sandy&apos;s seminar</title>
<link rel="alternate" type="text/html" href="http://blog.lib.umn.edu/roge0285/biostat/008782.html" />
<modified>2005-11-28T18:50:04Z</modified>
<issued>2004-10-29T21:03:21Z</issued>
<id>tag:blog.lib.umn.edu,2004:/roge0285/biostat//926.8782</id>
<created>2004-10-29T21:03:21Z</created>
<summary type="text/plain">I attended the School of Statistics seminar yesterday where Sandy Weisberg talked about why none of the existing statistical packages is quite right for a target audience of non-professional statisticians that still need to analyze data on an irregular basis....</summary>
<author>
<name>roge0285</name>


</author>
<dc:subject></dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://blog.lib.umn.edu/roge0285/biostat/">
<![CDATA[<p>I attended the School of Statistics seminar yesterday where Sandy Weisberg talked about why none of the existing statistical packages is quite right for a target audience of non-professional statisticians that still need to analyze data on an irregular basis.</p>

<p>It's got me thinking about what that app should look like. He mentioned Excel as probably being the most universal app where everyone would feel at home with the interface. However the problem for data analysis is that Excel doesn't do it.</p>

<p>That leads me to think of SPSS (which he mentioned, but without much detail) and Minitab. These both have a bit of an Excel feel, in that you can see the data in a nice spreadsheet at all times. Their menu systems keep infrequent users from having to recall the names of functions and analysis routines. Maybe some sort of a decision tree to help direct the analysis would be useful. What features are these programs lacking to fill that niche better?</p>

<p>For one, they should have interactive graphics. They need better extensibility. What else?</p>]]>

</content>
</entry>

</feed>
