Oct. 9, 2008
> ###NOTE: Check Out Fred Mosteller & another: Federalist Papers analysis, statistical analysis of text.###
> ###
Confidence intervals for d
###-You MUST meet normality assumption! If not it will be wrong.
> -This limits your use of d.
> ###
Bootstrapping
###> ###-another means of getting confidence intervals.###
> ###sample from a sample with replacement of the values each time you resample.###
> ###
Another means of getting an accurate representation
###> ###-use trimmed mean and winsorized SD of sample in the d equation.###
> ###
Power
###> ###Research question: Did these women get over one hour of sun a week?
> ###
read.table or csv has an arguement called na.strings
###> ###-read.csv("...",header=T,na.strings="[whatever non-data string placeholder was used]")###
> elderly<-read.csv("Osteoporosis.csv",header=T,na.strings="NA")
> attach(elderly)
> class(elderly$avg_sun)
[1] "numeric"
> plot(density(avg_sun,kernel="e"))
Error in density.default(avg_sun, kernel = "e") :
'x' contains missing values
> ###NOTE: density() has an na.rm parameter that defaults to false so it will think that there are no missing values.###
> plot(density(avg_sun,kernel="e",na.rm=T))
> summary(avg_sun)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.1120 0.5729 0.8889 1.0570 1.3960 2.9610 24.0000
> lenght(avg_sun)
Error: could not find function "lenght"
> length(avg_sun)
[1] 57
> 57-24
[1] 33
> ###So we have 33 useable entries###
> ###We have lost a lot of missing data, and we have introduced bias into our estimates.###
> ###Is htere something inherantly different concerning sun exposure about the people that don't answer that question?###
> ###If you cannot make the arguement that the missing data is random you have to make caveats about the missing data.###
> ###---best way to deal with it is to probably ignore it and write about it in the limitations.###
> library(car)
> qq.plot(avg_sun)
> ###-looks OK, no severe problems###
> mean(avg_sun,na.rm=T)
[1] 1.057165
> sun<-avg_sun
> sd(sun,na.rm=T)
[1] 0.6590523
> t.test(sun,mu=1)
One Sample t-test
data: sun
t = 0.4983, df = 32, p-value = 0.6217
alternative hypothesis: true mean is not equal to 1
95 percent confidence interval:
0.8234756 1.2908552
sample estimates:
mean of x
1.057165
> ###---we use 1 in the t test for mu because we are testing if they get more than one hour.###
> ###
We clearly fail to reject the null based on both the p value and the confidence interval.
###> ###---What do we do now? Why did we not find a significant effect?###
> ###---1. it is actually true that these people don't get enough sun.###
> ###---or 2. our sample ws too small.###
> library(MBESS)
> smd(Mean.1=1.057165,Mean.2=1,s=.6590523)
[1] 0.08673818
> ###Mean.1 = sample mean, Mean.2=test mean, s=sample standard deviation###
> ###We have a very small effect###
> ###Probability of making a type two error is very high with small sample sizes.###
> ###d-hat above was only .09, translates to a small effect###
> ###Ways of decreasing chances of making a type two error###
> ###1. decrease alpha level###
> ###2. population effect###
> ###3. increase sample size###
>
> ###NOTE: Non-central Z ot t means that the center of the distribution is not at zero.###
> ###Probability of making a type two error is the area of the actual non-central distributon that is included in the region of the central distribution that would tell us to fail to reject.###
>