Dec. 9, 2008
> library(faraway)
> data(infmort)
> attach(infmort)
> interaction.plot(region,oil,mortality,fun=function(x) mean(x,na.rm=T),ylab="World Region",trace.label="Mortality", type = "b", pch = "19")
Warning message:
In legend(xleg, yleg, legend = ylabs, col = col, pch = if (type %in% :
not using pch[2..] since pch[1] has multiple chars
> ###Changing line and plot type in an interaction plot###
> interaction.plot(region,oil,mortality,fun=function(x) mean(x,na.rm=T),ylab="World Region",trace.label="Mortality", type = "b", pch = 19)
> interaction.plot(region,oil,mortality,fun=function(x) mean(x,na.rm=T),ylab="Mortality",trace.label="Oil Exports", type = "b", pch = 19)
> detach(infmort)
> vl<-read.table("VLSSperCapita.txt", header = T)
> attach(vl)
> head(vl)
household dong region urban dollar regionName
1 12225 2764.8999 1 0 184.32666 Northern Uplands
2 13314 940.9336 1 0 62.72891 Northern Uplands
3 12101 1786.9258 1 0 119.12839 Northern Uplands
4 7505 2130.5046 1 1 142.03364 Northern Uplands
5 11709 1149.1510 1 0 76.61007 Northern Uplands
6 15309 1461.8467 1 0 97.45645 Northern Uplands
> tapply(household,as.factor(region):as.factor(urban), mean,na.rm=T)
1:0 1:1 2:0 2:1 3:0 3:1 4:0 4:1
13760.817 6538.326 18388.126 3570.832 22780.238 6069.667 26260.526 8298.266
5:0 5:1 6:0 6:1 7:0 7:1
29076.707 NA 31996.681 3391.250 36262.457 10112.734
> 39262.457+10112.734
[1] 49375.19
> tapply(household,as.factor(region), mean,na.rm=T)
1 2 3 4 5 6 7
12188.52 13444.82 20231.17 20257.22 29076.71 17763.87 29630.96
> 36262.457+10112.734
[1] 46375.19
> (36262.457+10112.734)/2
[1] 23187.60
> tapply(household,as.factor(region):as.factor(urban), length)
1:0 1:1 2:0 2:1 3:0 3:1 4:0 4:1 5:0 5:1 6:0 6:1 7:0 7:1
672 187 783 392 600 108 502 252 368 NA 514 509 830 282
> (36262.457*830+10112.734*282)/1112
[1] 29630.96
> model<-aov(household~region+urban+region:urban)
> anova(model)
Analysis of Variance Table
Response: household
Df Sum Sq Mean Sq F value Pr(>F)
region 1 1.5928e+11 1.5928e+11 35772 < 2.2e-16 ***
urban 1 5.0572e+11 5.0572e+11 113580 < 2.2e-16 ***
region:urban 1 6.0949e+10 6.0949e+10 13688 < 2.2e-16 ***
Residuals 5995 2.6693e+10 4.4526e+06
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> model<-aov(household~urban+region+region:urban)
> anova(model)
Analysis of Variance Table
Response: household
Df Sum Sq Mean Sq F value Pr(>F)
urban 1 4.6756e+11 4.6756e+11 105008 < 2.2e-16 ***
region 1 1.9745e+11 1.9745e+11 44345 < 2.2e-16 ***
urban:region 1 6.0949e+10 6.0949e+10 13688 < 2.2e-16 ***
Residuals 5995 2.6693e+10 4.4526e+06
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> model3<-aov(mortality~region+oil+region:oil,data=infmort)
> model4<-aov(mortality~oil+region+region:oil,data=infmort)
> anova(model3)
Analysis of Variance Table
Response: mortality
Df Sum Sq Mean Sq F value Pr(>F)
region 3 210752 70251 12.8446 4.188e-07 ***
oil 1 42197 42197 7.7153 0.006610 **
region:oil 2 57433 28717 5.2505 0.006892 **
Residuals 94 514112 5469
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> anova(model4)
Analysis of Variance Table
Response: mortality
Df Sum Sq Mean Sq F value Pr(>F)
oil 1 60073 60073 10.9837 0.001306 **
region 3 192877 64292 11.7551 1.315e-06 ***
oil:region 2 57433 28717 5.2505 0.006892 **
Residuals 94 514112 5469
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> ###This happens because R uses sequential sums of squares (type I), so the first factor takes up as much expanation as it can.
> ###So, if you vary which one you put in the model first, it will vary which factor gets figured first.
> library(car)
Attaching package: 'car'
The following object(s) are masked from package:faraway :
logit,
vif
> anova(model)
Analysis of Variance Table
Response: household
Df Sum Sq Mean Sq F value Pr(>F)
urban 1 4.6756e+11 4.6756e+11 105008 < 2.2e-16 ***
region 1 1.9745e+11 1.9745e+11 44345 < 2.2e-16 ***
urban:region 1 6.0949e+10 6.0949e+10 13688 < 2.2e-16 ***
Residuals 5995 2.6693e+10 4.4526e+06
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> model<-aov(household~region+urban+region:urban)
> model2<-aov(household~urban+region+region:urban)
> anova(model)
Analysis of Variance Table
Response: household
Df Sum Sq Mean Sq F value Pr(>F)
region 1 1.5928e+11 1.5928e+11 35772 < 2.2e-16 ***
urban 1 5.0572e+11 5.0572e+11 113580 < 2.2e-16 ***
region:urban 1 6.0949e+10 6.0949e+10 13688 < 2.2e-16 ***
Residuals 5995 2.6693e+10 4.4526e+06
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> anova(model2)
Analysis of Variance Table
Response: household
Df Sum Sq Mean Sq F value Pr(>F)
urban 1 4.6756e+11 4.6756e+11 105008 < 2.2e-16 ***
region 1 1.9745e+11 1.9745e+11 44345 < 2.2e-16 ***
urban:region 1 6.0949e+10 6.0949e+10 13688 < 2.2e-16 ***
Residuals 5995 2.6693e+10 4.4526e+06
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> Anova(model, type="III")
Anova Table (Type III tests)
Response: household
Sum Sq Df F value Pr(>F)
(Intercept) 1.1875e+11 1 26670.3 < 2.2e-16 ***
region 2.5759e+11 1 57850.8 < 2.2e-16 ***
urban 1.1271e+10 1 2531.4 < 2.2e-16 ***
region:urban 6.0949e+10 1 13688.4 < 2.2e-16 ***
Residuals 2.6693e+10 5995
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> Anova(model2, type="III")
Anova Table (Type III tests)
Response: household
Sum Sq Df F value Pr(>F)
(Intercept) 1.1875e+11 1 26670.3 < 2.2e-16 ***
urban 1.1271e+10 1 2531.4 < 2.2e-16 ***
region 2.5759e+11 1 57850.8 < 2.2e-16 ***
urban:region 6.0949e+10 1 13688.4 < 2.2e-16 ***
Residuals 2.6693e+10 5995
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> Anova(model2, type="II")
Anova Table (Type II tests)
Response: household
Sum Sq Df F value Pr(>F)
urban 5.0572e+11 1 113580 < 2.2e-16 ***
region 1.9745e+11 1 44345 < 2.2e-16 ***
urban:region 6.0949e+10 1 13688 < 2.2e-16 ***
Residuals 2.6693e+10 5995
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> Anova(model2, type="III")
Anova Table (Type III tests)
Response: household
Sum Sq Df F value Pr(>F)
(Intercept) 1.1875e+11 1 26670.3 < 2.2e-16 ***
urban 1.1271e+10 1 2531.4 < 2.2e-16 ***
region 2.5759e+11 1 57850.8 < 2.2e-16 ***
urban:region 6.0949e+10 1 13688.4 < 2.2e-16 ***
Residuals 2.6693e+10 5995
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> ###NOTE: Problem with type III sums of squares does not obey marginality. It asks how much does the interaction between the factors explain apart from the factors that make up the interaction, and that doesn't make sense.
> ###
Analysis of Variance is NOT a good way to analyze an unbalanced design. Regression is a much better way to analyze.
> ###When determining the order in which to include factors in the model, put your controls into the model first and add in the things you are interested in (focal predictors)last.
> model2<-aov(household~region:urban+urban+region)
> summary(model2)
Df Sum Sq Mean Sq F value Pr(>F)
urban 1 4.6756e+11 4.6756e+11 105008 < 2.2e-16 ***
region 1 1.9745e+11 1.9745e+11 44345 < 2.2e-16 ***
region:urban 1 6.0949e+10 6.0949e+10 13688 < 2.2e-16 ***
Residuals 5995 2.6693e+10 4.4526e+06
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1