<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>UMN 8262</title>
    <link rel="alternate" type="text/html" href="http://blog.lib.umn.edu/dillo109/jons8262/" />
    <link rel="self" type="application/atom+xml" href="http://blog.lib.umn.edu/dillo109/jons8262/atom.xml" />
   <id>tag:blog.lib.umn.edu,2009:/dillo109/jons8262//9711</id>
    <link rel="service.post" type="application/atom+xml" href="http://blog.lib.umn.edu/cgi-bin/mt-atom.cgi/weblog/blog_id=9711" title="UMN 8262" />
    <updated>2009-02-10T00:06:45Z</updated>
    <subtitle></subtitle>
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type 4.25</generator>
 

<entry>
    <title>Feb. 9, 2009</title>
    <link rel="alternate" type="text/html" href="http://blog.lib.umn.edu/dillo109/jons8262/2009/02/feb_9_2009.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://blog.lib.umn.edu/cgi-bin/mt-atom.cgi/weblog/blog_id=9711/entry_id=165747" title="Feb. 9, 2009" />
    <id>tag:blog.lib.umn.edu,2009:/dillo109/jons8262//9711.165747</id>
    
    <published>2009-02-10T00:05:53Z</published>
    <updated>2009-02-10T00:06:45Z</updated>
    
    <summary> &gt; nels &gt; head(nels) HW.Hours StdMathScore 345 -0.3329931 42.432 759 -0.2136822 53.698 95 -1.0077991 49.205 325 0.2059000 53.698 355 -0.1177185 55.980 377 0.1413540 65.331 &gt; attach(nels) &gt; model &gt; summary(model) Call: lm(formula = StdMathScore ~ HW.Hours, data = nels)...</summary>
    <author>
        <name>dillo109</name>
        <uri></uri>
    </author>
    
    <content type="html" xml:lang="en" xml:base="http://blog.lib.umn.edu/dillo109/jons8262/">
        <![CDATA[<p></p>

<p>> nels<-read.table("NELSsample.txt", header=T)<br />
> head(nels)<br />
      HW.Hours StdMathScore<br />
345 -0.3329931       42.432<br />
759 -0.2136822       53.698<br />
95  -1.0077991       49.205<br />
325  0.2059000       53.698<br />
355 -0.1177185       55.980<br />
377  0.1413540       65.331<br />
> attach(nels)<br />
> model<-lm(StdMathScore~HW.Hours, data=nels)<br />
> summary(model)</p>

<p>Call:<br />
lm(formula = StdMathScore ~ HW.Hours, data = nels)</p>

<p>Residuals:<br />
     Min       1Q   Median       3Q      Max <br />
-19.9886  -8.5163  -0.7377   8.2180  21.5530 </p>

<p>Coefficients:<br />
            Estimate Std. Error t value Pr(>|t|)    <br />
(Intercept)  51.3947     0.7055  72.853  < 2e-16 ***<br />
HW.Hours      1.7826     0.5811   3.068  0.00246 ** <br />
---<br />
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 </p>

<p>Residual standard error: 9.959 on 198 degrees of freedom<br />
Multiple R-squared: 0.04538,    Adjusted R-squared: 0.04055 <br />
F-statistic: 9.411 on 1 and 198 DF,  p-value: 0.002458 </p>

<p> Y-hat = 51.34 + 1.78(x)</p>

<p>H-not (intercept): Beta-not = 0<br />
H-not (coefficient): Beta-sub-one = 0</p>

<p>From the above output we can tell that HW.Hours accounts for only about 5% of the variation in Math Score.<br />
It also tell sus that our 95% margin of error is about +/- 19.92 (9.96*2) points.<br />
BETA-hat-sub-one = 1.78<br />
<h2>Confidence interval for slope</h2></p>

<p>We are saying that we used a method that works 95% of the time. <br />
> confint(model)<br />
                 2.5 %    97.5 %<br />
(Intercept) 50.0035113 52.785848<br />
HW.Hours     0.6367181  2.928483</p>

<p>Our interval estimate in this case is anywhere from .64 to 2.93<br />
<b>Remember we have been talking about the confidence interval for the parameter.</b><br />
<h2>Other confidence intervals in regression besides parameter estimates</h2><br />
If our end goal is to use the model to predict we probably are more interested in a conf. interval for the prediciton that we can make based on that model.<br />
We can get a confidence interval for the predicted individual value or we can get the interval for the conditional mean (mean of all points at a particular measurement).<br />
<h3>predicting the conditional mean</h3><br />
mu-sub-X|Y<br />
predict(modelName,interval="confidence")<br />
> predict(model,interval="confidence")<br />
         fit      lwr      upr<br />
345 50.80109 49.33697 52.26520<br />
759 51.01377 49.58705 52.44049<br />
95  49.59818 47.73843 51.45792<br />
325 51.76172 50.36448 53.15895<br />
...<br />
> model.predictions<-predict(model,interval="confidence")<br />
<b>Confidence Bands</b><br />
> library(NCStats)<br />
Loading required package: car<br />
Loading required package: gplots<br />
Loading required package: gtools</p>

<p>Attaching package: 'gtools'</p>

<p><br />
        The following object(s) are masked from package:car :</p>

<p>         logit </p>

<p>Loading required package: gdata</p>

<p>Attaching package: 'gplots'</p>

<p><br />
        The following object(s) are masked from package:stats :</p>

<p>         lowess </p>

<p>Loading required package: Hmisc</p>

<p>Attaching package: 'Hmisc'</p>

<p><br />
        The following object(s) are masked from package:gdata :</p>

<p>         combine,<br />
         reorder.factor </p>

<p><br />
        The following object(s) are masked from package:car :</p>

<p>         recode </p>

<p><br />
        The following object(s) are masked from package:base :</p>

<p>         format.pval,<br />
         round.POSIXt,<br />
         trunc.POSIXt,<br />
         units </p>

<p>Loading required package: multcomp<br />
Loading required package: mvtnorm<br />
Loading required package: nortest<br />
Loading required package: sciplot<br />
Loading required package: tcltk<br />
Loading Tcl/Tk interface ... done<br />
Loading required package: TeachingDemos</p>

<p>Attaching package: 'TeachingDemos'</p>

<p><br />
        The following object(s) are masked from package:Hmisc :</p>

<p>         cnvrt.coords,<br />
         subplot </p>

<p></p>

<p>##########################################<br />
## NCStats package by Derek H. Ogle     ##<br />
##    type ?NCStats for documentation.  ##<br />
##########################################</p>

<p><br />
Attaching package: 'NCStats'</p>

<p><br />
        The following object(s) are masked from package:stats :</p>

<p>         print.anova </p>

<p><br />
        The following object(s) are masked from package:methods :</p>

<p>         Summary </p>

<p>> help(prediciton.plot)<br />
No documentation for 'prediciton.plot' in specified packages and libraries:<br />
you could try '??prediciton.plot'<br />
> help(prediction.plot)</p>

<p>> prediction.plot(model,interval="confidence",newdata=nels)<br />
    obs     HW.Hours StdMathScore      fit      lwr      upr<br />
345   1 -0.332993081       42.432 50.80109 49.33697 52.26520<br />
759   2 -0.213682155       53.698 51.01377 49.58705 52.44049<br />
95    3 -1.007799147       49.205 49.59818 47.73843 51.45792<br />
325   4  0.205899984       53.698 51.76172 50.36448 53.15895<br />
...<br />
> <br />
</p>]]>
        
    </content>
</entry>

<entry>
    <title>Fed 4, 2009</title>
    <link rel="alternate" type="text/html" href="http://blog.lib.umn.edu/dillo109/jons8262/2009/02/fed_4_2009.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://blog.lib.umn.edu/cgi-bin/mt-atom.cgi/weblog/blog_id=9711/entry_id=164916" title="Fed 4, 2009" />
    <id>tag:blog.lib.umn.edu,2009:/dillo109/jons8262//9711.164916</id>
    
    <published>2009-02-05T00:01:04Z</published>
    <updated>2009-02-05T00:08:14Z</updated>
    
    <summary>&gt; nels &gt; head(nels) HW.Hours StdMathScore 1 -0.04391888 59.514 2 0.26639017 47.954 3 0.30237000 42.799 4 0.28818846 49.205 5 0.64999216 52.519 6 0.23066355 44.493 &gt; sample1 &gt; head(sample1) HW.Hours StdMathScore 483 0.03774998 56.053 138 0.31898249 44.345 908 1.61900567 59.514 689...</summary>
    <author>
        <name>dillo109</name>
        <uri></uri>
    </author>
    
    <content type="html" xml:lang="en" xml:base="http://blog.lib.umn.edu/dillo109/jons8262/">
        <![CDATA[<p>> nels<-read.table("NELS.txt", header=T)<br />
> head(nels)<br />
     HW.Hours StdMathScore<br />
1 -0.04391888       59.514<br />
2  0.26639017       47.954<br />
3  0.30237000       42.799<br />
4  0.28818846       49.205<br />
5  0.64999216       52.519<br />
6  0.23066355       44.493<br />
> sample1<- read.table("Sample1.txt", header=T)<br />
> head(sample1)<br />
       HW.Hours StdMathScore<br />
483  0.03774998       56.053<br />
138  0.31898249       44.345<br />
908  1.61900567       59.514<br />
689 -1.80284598       44.641<br />
130 -0.70691857       45.892<br />
189  1.09322485       61.871<br />
> sample2<-read.table("Sample2.txt", header=T)<br />
> head(sample2)<br />
       HW.Hours StdMathScore<br />
776  0.18038788       42.211<br />
805  0.20964010       58.704<br />
653 -0.05089566       63.270<br />
52   1.37044411       64.522<br />
329  1.20164587       45.597<br />
569  0.10342771       54.949<br />
> library(lattice)<br />
> xyplot(StdMathScore~HW.Hours, data=nels, type=c("p","r"))<br />
> model<-lm(StdMathScore~HW.Hours, data=nels)<br />
> coef(model)<br />
(Intercept)    HW.Hours <br />
  51.987840    1.374264 <br />
<b>We are assuming now that all 887 students in the NELS data is the entire population</b><br />
<h2>Drawing a Random Sample</h2><br />
<i>With Replacement</i><br />
-We already have a random sample in "Sample1.txt"<br />
-There is another random sample in "Sample2.txt"<br />
> xyplot(StdMathScore~HW.Hours, data=sample1, type=c("p","r"))<br />
> xyplot(sd(StdMathScore)/StdMathScore~sd(HW.Hours)/HW.Hours, data=sample1, type=c("p","r"))<br />
> xyplot((sd(StdMathScore)/StdMathScore)~(sd(HW.Hours)/HW.Hours), data=sample1, type=c("p","r"))<br />
> xyplot(StdMathScore/sd(StdMathScore)~HW.Hours/sd(HW.Hours), data=sample1, type=c("p","r"))<br />
> xyplot(StdMathScore~HW.Hours, data=nels, type=c("p","r"))<br />
> xyplot(StdMathScore~HW.Hours, data=sample1, type=c("p","r"))<br />
> xyplot(StdMathScore~HW.Hours, data=sample2, type=c("p","r"))<br />
> model1<-lm(StdMathScore~HW.Hours, data=sample1)<br />
> model2<-lm(StdMathScore~HW.Hours, data=sample2)<br />
> coef(model1)<br />
(Intercept)    HW.Hours <br />
 53.4549682   0.3746038 <br />
> coef(model2)<br />
(Intercept)    HW.Hours <br />
 51.6687436  -0.2858852 <br />
> plot(StdMathScore~HW.Hours, data=sample1)<br />
> abline(model1)<br />
> abline(model,lty="dotted")<br />
> abline(model2,lty="solid", lwd="2")<br />
-The only reason that slopes and intercepts for equations for different random samples is sampling error. <br />
-If we assume an infinite number of samples the the mean of all possible regression slopes is equal to the true population slope. <br />
<H2>Hypothesis testing</H2><br />
Hnull: beta-sub-1 = 0<br />
Halt: beta-sub-1 != 0<br />
-the basic idea is that you get a sample of data and ask under the NULL HYPOTHESIS how likely is it that we will see the coeficient for our sample.<br />
> sample3<- read.table("NELSSample.txt", header=T)<br />
> head(sample3)<br />
      HW.Hours StdMathScore<br />
345 -0.3329931       42.432<br />
759 -0.2136822       53.698<br />
95  -1.0077991       49.205<br />
325  0.2059000       53.698<br />
355 -0.1177185       55.980<br />
377  0.1413540       65.331<br />
> model3<-lm(StdMathScore~HW.Hours, data=sample3)<br />
> coef(model3)<br />
(Intercept)    HW.Hours <br />
  51.394680    1.782601 <br />
t statistic: t = Ybar - HypValue/(beta/n^(1/2))<br />
in regression t = (BETA-hat-sub-one - 0)/Standard Error of BETA-hat-sub-1 <br />
for our data n = 200, t = (1.78 - 0)/.5811<br />
> (1.78-0)/.5811<br />
[1] 3.063156<br />
p value for a regression = pt(-t,df) = cumulative density in a t distribution, we use the negative value for the t stat.<br />
> pt(-3.07,198)<br />
[1] 0.001220481<br />
> 2*pt(-3.07,198)<br />
[1] 0.002440963<br />
p = .002<br />
It is likely that the true regression slope is not zero, so we know there is some form of relationship.<br />
summary(modelName)<br />
> summary(model3)</p>

<p>Call:<br />
lm(formula = StdMathScore ~ HW.Hours, data = sample3)</p>

<p>Residuals:<br />
     Min       1Q   Median       3Q      Max <br />
-19.9886  -8.5163  -0.7377   8.2180  21.5530 </p>

<p>Coefficients:<br />
            Estimate Std. Error t value Pr(>|t|)    <br />
(Intercept)  51.3947     0.7055  72.853  < 2e-16 ***<br />
HW.Hours      1.7826     0.5811   3.068  0.00246 ** <br />
---<br />
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 </p>

<p>Residual standard error: 9.959 on 198 degrees of freedom<br />
Multiple R-squared: 0.04538,    Adjusted R-squared: 0.04055 <br />
F-statistic: 9.411 on 1 and 198 DF,  p-value: 0.002458 </p>

<p>> anova(model3)<br />
Analysis of Variance Table</p>

<p>Response: StdMathScore<br />
           Df  Sum Sq Mean Sq F value   Pr(>F)   <br />
HW.Hours    1   933.5   933.5  9.4113 0.002458 **<br />
Residuals 198 19638.9    99.2                    <br />
---<br />
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 <br />
> <br />
</p>]]>
        
    </content>
</entry>

<entry>
    <title>Fed 4, 2009</title>
    <link rel="alternate" type="text/html" href="http://blog.lib.umn.edu/dillo109/jons8262/2009/02/fed_4_2009_1.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://blog.lib.umn.edu/cgi-bin/mt-atom.cgi/weblog/blog_id=9711/entry_id=164917" title="Fed 4, 2009" />
    <id>tag:blog.lib.umn.edu,2009:/dillo109/jons8262//9711.164917</id>
    
    <published>2009-02-05T00:01:04Z</published>
    <updated>2009-02-05T00:08:15Z</updated>
    
    <summary>&gt; nels &gt; head(nels) HW.Hours StdMathScore 1 -0.04391888 59.514 2 0.26639017 47.954 3 0.30237000 42.799 4 0.28818846 49.205 5 0.64999216 52.519 6 0.23066355 44.493 &gt; sample1 &gt; head(sample1) HW.Hours StdMathScore 483 0.03774998 56.053 138 0.31898249 44.345 908 1.61900567 59.514 689...</summary>
    <author>
        <name>dillo109</name>
        <uri></uri>
    </author>
    
    <content type="html" xml:lang="en" xml:base="http://blog.lib.umn.edu/dillo109/jons8262/">
        <![CDATA[<p>> nels<-read.table("NELS.txt", header=T)<br />
> head(nels)<br />
     HW.Hours StdMathScore<br />
1 -0.04391888       59.514<br />
2  0.26639017       47.954<br />
3  0.30237000       42.799<br />
4  0.28818846       49.205<br />
5  0.64999216       52.519<br />
6  0.23066355       44.493<br />
> sample1<- read.table("Sample1.txt", header=T)<br />
> head(sample1)<br />
       HW.Hours StdMathScore<br />
483  0.03774998       56.053<br />
138  0.31898249       44.345<br />
908  1.61900567       59.514<br />
689 -1.80284598       44.641<br />
130 -0.70691857       45.892<br />
189  1.09322485       61.871<br />
> sample2<-read.table("Sample2.txt", header=T)<br />
> head(sample2)<br />
       HW.Hours StdMathScore<br />
776  0.18038788       42.211<br />
805  0.20964010       58.704<br />
653 -0.05089566       63.270<br />
52   1.37044411       64.522<br />
329  1.20164587       45.597<br />
569  0.10342771       54.949<br />
> library(lattice)<br />
> xyplot(StdMathScore~HW.Hours, data=nels, type=c("p","r"))<br />
> model<-lm(StdMathScore~HW.Hours, data=nels)<br />
> coef(model)<br />
(Intercept)    HW.Hours <br />
  51.987840    1.374264 <br />
<b>We are assuming now that all 887 students in the NELS data is the entire population</b><br />
<h2>Drawing a Random Sample</h2><br />
<i>With Replacement</i><br />
-We already have a random sample in "Sample1.txt"<br />
-There is another random sample in "Sample2.txt"<br />
> xyplot(StdMathScore~HW.Hours, data=sample1, type=c("p","r"))<br />
> xyplot(sd(StdMathScore)/StdMathScore~sd(HW.Hours)/HW.Hours, data=sample1, type=c("p","r"))<br />
> xyplot((sd(StdMathScore)/StdMathScore)~(sd(HW.Hours)/HW.Hours), data=sample1, type=c("p","r"))<br />
> xyplot(StdMathScore/sd(StdMathScore)~HW.Hours/sd(HW.Hours), data=sample1, type=c("p","r"))<br />
> xyplot(StdMathScore~HW.Hours, data=nels, type=c("p","r"))<br />
> xyplot(StdMathScore~HW.Hours, data=sample1, type=c("p","r"))<br />
> xyplot(StdMathScore~HW.Hours, data=sample2, type=c("p","r"))<br />
> model1<-lm(StdMathScore~HW.Hours, data=sample1)<br />
> model2<-lm(StdMathScore~HW.Hours, data=sample2)<br />
> coef(model1)<br />
(Intercept)    HW.Hours <br />
 53.4549682   0.3746038 <br />
> coef(model2)<br />
(Intercept)    HW.Hours <br />
 51.6687436  -0.2858852 <br />
> plot(StdMathScore~HW.Hours, data=sample1)<br />
> abline(model1)<br />
> abline(model,lty="dotted")<br />
> abline(model2,lty="solid", lwd="2")<br />
-The only reason that slopes and intercepts for equations for different random samples is sampling error. <br />
-If we assume an infinite number of samples the the mean of all possible regression slopes is equal to the true population slope. <br />
<H2>Hypothesis testing</H2><br />
Hnull: beta-sub-1 = 0<br />
Halt: beta-sub-1 != 0<br />
-the basic idea is that you get a sample of data and ask under the NULL HYPOTHESIS how likely is it that we will see the coeficient for our sample.<br />
> sample3<- read.table("NELSSample.txt", header=T)<br />
> head(sample3)<br />
      HW.Hours StdMathScore<br />
345 -0.3329931       42.432<br />
759 -0.2136822       53.698<br />
95  -1.0077991       49.205<br />
325  0.2059000       53.698<br />
355 -0.1177185       55.980<br />
377  0.1413540       65.331<br />
> model3<-lm(StdMathScore~HW.Hours, data=sample3)<br />
> coef(model3)<br />
(Intercept)    HW.Hours <br />
  51.394680    1.782601 <br />
t statistic: t = Ybar - HypValue/(beta/n^(1/2))<br />
in regression t = (BETA-hat-sub-one - 0)/Standard Error of BETA-hat-sub-1 <br />
for our data n = 200, t = (1.78 - 0)/.5811<br />
> (1.78-0)/.5811<br />
[1] 3.063156<br />
p value for a regression = pt(-t,df) = cumulative density in a t distribution, we use the negative value for the t stat.<br />
> pt(-3.07,198)<br />
[1] 0.001220481<br />
> 2*pt(-3.07,198)<br />
[1] 0.002440963<br />
p = .002<br />
It is likely that the true regression slope is not zero, so we know there is some form of relationship.<br />
summary(modelName)<br />
> summary(model3)</p>

<p>Call:<br />
lm(formula = StdMathScore ~ HW.Hours, data = sample3)</p>

<p>Residuals:<br />
     Min       1Q   Median       3Q      Max <br />
-19.9886  -8.5163  -0.7377   8.2180  21.5530 </p>

<p>Coefficients:<br />
            Estimate Std. Error t value Pr(>|t|)    <br />
(Intercept)  51.3947     0.7055  72.853  < 2e-16 ***<br />
HW.Hours      1.7826     0.5811   3.068  0.00246 ** <br />
---<br />
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 </p>

<p>Residual standard error: 9.959 on 198 degrees of freedom<br />
Multiple R-squared: 0.04538,    Adjusted R-squared: 0.04055 <br />
F-statistic: 9.411 on 1 and 198 DF,  p-value: 0.002458 </p>

<p>> anova(model3)<br />
Analysis of Variance Table</p>

<p>Response: StdMathScore<br />
           Df  Sum Sq Mean Sq F value   Pr(>F)   <br />
HW.Hours    1   933.5   933.5  9.4113 0.002458 **<br />
Residuals 198 19638.9    99.2                    <br />
---<br />
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 <br />
> <br />
</p>]]>
        
    </content>
</entry>

<entry>
    <title>Feb. 2, 2008</title>
    <link rel="alternate" type="text/html" href="http://blog.lib.umn.edu/dillo109/jons8262/2009/02/feb_2_2008.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://blog.lib.umn.edu/cgi-bin/mt-atom.cgi/weblog/blog_id=9711/entry_id=164551" title="Feb. 2, 2008" />
    <id>tag:blog.lib.umn.edu,2009:/dillo109/jons8262//9711.164551</id>
    
    <published>2009-02-03T00:03:48Z</published>
    <updated>2009-02-03T00:12:25Z</updated>
    
    <summary>burt attach(burt) model Regression Lingo Alert Regress the outcome variable (Y) on the predictor variable-We regress Y on X coef(model) (Intercept) OwnIQ 9.719491 0.907920 anova(model) Analysis of Variance Table Response: FostIQ Df Sum Sq Mean Sq F value Pr(&gt;F) OwnIQ...</summary>
    <author>
        <name>dillo109</name>
        <uri></uri>
    </author>
    
    <content type="html" xml:lang="en" xml:base="http://blog.lib.umn.edu/dillo109/jons8262/">
        <![CDATA[<p><br/>burt <- read.table("burt.txt", header = T)<br />
<br/>attach(burt)<br />
<br/>model<-lm(FostIQ~OwnIQ)<br />
<br/><h2>Regression Lingo Alert</h2><br />
<br/><b>Regress the outcome variable (Y) on the predictor variable<X></b>-We regress Y on X<br />
<br/>coef(model)<br />
(Intercept)       OwnIQ <br />
   9.719491    0.907920 <br />
<br/>anova(model)<br />
Analysis of Variance Table</p>

<p>Response: FostIQ<br />
          Df Sum Sq Mean Sq F value    Pr(>F)    <br />
OwnIQ      1 9250.7  9250.7  169.42 < 2.2e-16 ***<br />
Residuals 51 2784.7    54.6                      <br />
---<br />
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 <br />
<br/><h2>Fitted Model</h2><br />
<br/>-Drops error from the regression equation.<br />
<br/>Yvariable-sub-i = (Intercept) + Slope*Xvariable-sub-i<br />
<br/>(Regression Equation includes error so the Y variable is not an estimate but in the Fitted Model the Y is an estimate [<b>needs the hat</b>])<br />
<br/>library(lattice)<br />
<br/>xyplot(FostIQ~OwnIQ,type=c("p","r"))<br />
<br/><h2>R^2</h2><br />
<br/>R^2 = SSmodel/SStotal<br />
<br/>R^2 = 9251/12035 = .769<br />
<br/>*76.9% of difference in FostIQ is accounted for by difference in OwnIQ and 23.1% is not.<br />
<br/>We cannot account for how the unexplained variation divides the variation between other systematic components, measurement error, and individual variation. <br />
<br/><h2>Estimated Residual Variance</h2><br />
<br/>What is the variance of the mean estimates for the scores at each point on the line (<B>REmember that the line represents the means of the potential distribution at any point on a line.</b>)<br />
<br/>sigma-hat-<br />
<br/>sigma-hat^2-sub-X|Y = SSresiduals/n<br />
<br/>sigma-hat^2-sub-X|Y = SSresiduals/n - (parameters in equation)<br />
<br/>sigma-hat^2-sub-X|Y = 2785/31 - 2<br />
<br/>SD=Sqrt(sigma-hat^2-sub-X|Y = 2785/31 - 2)<br />
<br/>sqrt(54.6)<br />
[1] 7.389181<br />
<br/>We are therefore sure to around 95% (2 SDs) that our predicted values will be within about 15 points either side of any particular point estimate.<br />
<br/>9.72+.91*75<br />
[1] 77.97<br />
<br/>77.97-14.8<br />
[1] 63.17<br />
<br/>77.97+14.8<br />
[1] 92.77<br />
<br/>So we are sure that for an OwnIQ score of 75 we would expect the Fost IQ score to be somewhere between 63.1,92.8.<br />
 <br />
</p>]]>
        
    </content>
</entry>

<entry>
    <title>Jan 28, 2009</title>
    <link rel="alternate" type="text/html" href="http://blog.lib.umn.edu/dillo109/jons8262/2009/01/jan_28_2009.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://blog.lib.umn.edu/cgi-bin/mt-atom.cgi/weblog/blog_id=9711/entry_id=163599" title="Jan 28, 2009" />
    <id>tag:blog.lib.umn.edu,2009:/dillo109/jons8262//9711.163599</id>
    
    <published>2009-01-28T23:59:59Z</published>
    <updated>2009-01-29T00:00:59Z</updated>
    
    <summary>Basic Regression Equation Outcome = functional form (line, curvilinear, etc.) + Residuals Equation of a line y = mx + b mx = slope b = y-intercept m/1 = rise/run = deltaY/deltaX (delta = change in) Regression Model Y-sub-i =...</summary>
    <author>
        <name>dillo109</name>
        <uri></uri>
    </author>
    
    <content type="html" xml:lang="en" xml:base="http://blog.lib.umn.edu/dillo109/jons8262/">
        <![CDATA[<h1>Basic Regression Equation</h1>
Outcome = functional form (line, curvilinear, etc.) + Residuals
<h2>Equation of a line</h2>
<b>y = mx + b</b>
mx = slope
b = y-intercept
m/1 = rise/run = deltaY/deltaX  (delta = change in)
<h2>Regression Model</h2>
Y-sub-i = Beta-sub-zero + Beta-sub-oneX-sub-i + epsilon-sub-i(error)
Beta-sub-zero + Beta-sub-oneX-sub-i is the functional form.
Some outcome = systematic component (functional form) plus some error
<b>Regression model is actually about the population.</b>
-We use sample data to get estimates for parameters (regression coefficients).
-The distribution of a particular variable to the independent variable is the marginal distribution. The distribution in the population at any point in the line is the conditional distribution.
<b>When reporting conditional distributions we write mu-sub-Y|X-sub-n which means the mean of Y conditioned on X. </b>
Y-hat = Beta-sub-zero + Beta-sub-oneX-sub-i (without the error). Y-hat is always on the line.
> burt <- read.table("burt.txt", header = T)
> library(psych)
> library(lattice)
> attach(burt)
> xyplot(FostIQ~OwnIQ,type="p")
<h2>Residuals</h2>
(Y-subi - Y-hat-sub-i)
Sum of squares residuals [SIGMA-sub-i(Y-sub-i - Y-hat-sub-1)^2]
<h2> Line of best fit</h2>
<b>-Line that minimizes the sum of squares residuals</b>
OLS is the ordinary <b>least squares</b> regression.
<h1>Linear Model Function in R = model <- lm(Y~X)</h1>
> model <- lm(FostIQ~OwnIQ)
-the model is now an object
> names(model)
 [1] "coefficients"  "residuals"     "effects"       "rank"         
 [5] "fitted.values" "assign"        "qr"            "df.residual"  
 [9] "xlevels"       "call"          "terms"         "model"        
-We are intereseted in the coefficients because they give us the y-intercept and the slope.
> model$coefficients
(Intercept)       OwnIQ 
   9.719491    0.907920 
-this gives us first the y-intercept (BETA-sub-zero) and then the slope(BETA-sub-one)
Our estimates are then BETA-sub-zero-hat = 9.719491 and BETA-sub-one-hat = .907920
The equation therefore is FostIQ-hat = 9.72 + .91(OwnIQ) or Y-hat = 9.72 + .91(X-sub-i)
> xyplot(FostIQ~OwnIQ,type=c("p","r"))
The plot for the scatter plot with the regression line is xyplot(X~Y,type=c("p","r"))
<h2>Evaluating fit of the model</h2>
-Well, we have the best line possible but how well does it really fit?
-We use what is called ANOVE regression decomposition.
-
Total variation = variation explained by the model + variation not explained by the model
*The distance between the mean of all of the values and the distance between the regression line is the variation predicted by the model.  The distance between the mean and the regression line is the variation not explained by the model.  The distance between a data point and the mean of all points is the total variation.
*For the entire model we add up the squared values of all the total variations, explained variations, and unexplained variations. 
> 9.72+.91*68
[1] 71.6
> mean(FostIQ)
[1] 98.1132
Residual = Actual - Predicted
REsidual = 63-71.6
> 63-71.6
[1] -8.6
Remembers it is the residuals we use to evaluate model fit.
> model$residuals
           1            2            3            4            5            6 
 -8.45804807   1.81819206   1.00235215  -5.81348777  -9.53724764  -6.44516759 
           7            8            9           10           11           12 
  2.73899249  -2.16892746   8.83107254   0.92315258  -3.89268733   6.19939271 
          13           14           15           16           17           18 
  4.29147276   8.29147276  11.47563284 -11.43228711 -10.34020707  -4.34020707 
          19           20           21           22           23           24 
 -2.24812703   2.75187297  -7.15604698   4.84395302   4.84395302  -1.06396694 
          25           26           27           28           29           30 
  0.02811310  -3.87980685  12.12019315  -5.78772681  -2.78772681  14.21227319 
          31           32           33           34           35           36 
 15.21227319   5.39643328 -12.51148668  13.58059337   1.67267341   2.76475345 

<p>          37           38           39           40           41           42 <br />
  3.94891354   1.04099358   2.04099358   1.13307363  -5.86692637 -12.77484633 <br />
          43           44           45           46           47           48 <br />
-12.49860620   4.59347385  -9.22236607  11.77763393  -6.13028602   0.96179402 <br />
          49           50           51           52           53 <br />
 -0.85404589  -1.57780576   4.79051441  -9.84116541   3.34299467 <br />
> model$fitted.values<br />
        1         2         3         4         5         6         7         8 <br />
 71.45805  74.18181  75.99765  77.81349  80.53725  81.44517  83.26101  84.16893 <br />
        9        10        11        12        13        14        15        16 <br />
 84.16893  85.07685  86.89269  87.80061  88.70853  88.70853  90.52437  91.43229 <br />
       17        18        19        20        21        22        23        24 <br />
 92.34021  92.34021  93.24813  93.24813  94.15605  94.15605  94.15605  95.06397 <br />
       25        26        27        28        29        30        31        32 <br />
 95.97189  96.87981  96.87981  97.78773  97.78773  97.78773  97.78773  99.60357 <br />
       33        34        35        36        37        38        39        40 <br />
100.51149 101.41941 102.32733 103.23525 105.05109 105.95901 105.95901 106.86693 <br />
       41        42        43        44        45        46        47        48 <br />
106.86693 107.77485 110.49861 111.40653 113.22237 113.22237 114.13029 115.03821 <br />
       49        50        51        52        53 <br />
116.85405 119.57781 123.20949 126.84117 128.65701 <br />
---Model fit can be defined by a proportion of the total variation explained by the model (the variation unexplained by the model divided by the total variation)<br />
---Model fit can be defined by a proportion of the total variation explained by the model (the variation explained by the model divided by the total variation)<br />
---SSmodel/SStotal (Sum of Squares model / Sum of Squares total) and SSresidual/SStotal (Sum of Squares residuals / Sum of Squares total)<br />
> anova(model)<br />
Analysis of Variance Table</p>

<p>Response: FostIQ<br />
          Df Sum Sq Mean Sq F value    Pr(>F)    <br />
OwnIQ      1 9250.7  9250.7  169.42 < 2.2e-16 ***<br />
Residuals 51 2784.7    54.6                      <br />
---<br />
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 <br />
SSmodel = 9520.7 and SSresidual = 2784.7<br />
> <br />
</p>]]>
        
    </content>
</entry>

<entry>
    <title>Jan 26, 2009</title>
    <link rel="alternate" type="text/html" href="http://blog.lib.umn.edu/dillo109/jons8262/2009/01/jan_26_2009.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://blog.lib.umn.edu/cgi-bin/mt-atom.cgi/weblog/blog_id=9711/entry_id=163186" title="Jan 26, 2009" />
    <id>tag:blog.lib.umn.edu,2009:/dillo109/jons8262//9711.163186</id>
    
    <published>2009-01-26T23:56:04Z</published>
    <updated>2009-01-26T23:56:29Z</updated>
    
    <summary> Mathematical and Statistical Models Mathematical Models -deterministic models, there is no error in a mathematical model Statistical Model -We are using models that allow for error and use probability -Allow for other systematic components that in many cases are...</summary>
    <author>
        <name>dillo109</name>
        <uri></uri>
    </author>
    
    <content type="html" xml:lang="en" xml:base="http://blog.lib.umn.edu/dillo109/jons8262/">
        <![CDATA[<p><br />
<h1>Mathematical and Statistical Models</h1><br />
<h2>Mathematical Models</h2><br />
-deterministic models, there is no error in a mathematical model<br />
<h2>Statistical Model</h2><br />
-We are using models that allow for error and use probability<br />
-Allow for other systematic components that in many cases are not included or were not measured.<br />
-Allow for measurement error (especially in the social sciences)<br />
-Allow for individual variation within the unit of analysis that we are analyzing.<br />
<h3>Goals of Creating Statistical Models</h3><br />
1.Identify systematic components<br />
2.Assess the model fit (looking at residuals [good model=smaller residuals])<br />
<h2>How do we use statistical Models</h2><br />
<b>Articulate Research Questions -> Outcome Variables, Focal (important) Predictors, Covariates (account or control for)</b><br />
<b>Postulate the statistical model (What is it going to look like?)</b><br />
...fitting model to sample data<br />
<b>Determine if relationship is due to chance -> Does the model really work in the population or is it by chance?</b><br />
<h2>Regression is all about relationships and associations</h2> <br />
-Causality can be only determined through the design of the study not through analysis.<br />
-Analysis discovers associations, correlations or covariation.<br />
> burt <- read.table("burt.txtv", header = T)</p>

<p>> burt <- read.table("burt.txt", header = T)<br />
> head(burt)<br />
  ID OwnIQ FostIQ<br />
1  1    68     63<br />
2  2    71     76<br />
3  3    73     77<br />
4  4    75     72<br />
5  5    78     71<br />
6  6    79     75<br />
> attach(burt)<br />
<b>Always a good idea to begin an analysis with a decriptive analysis and plots.</b><br />
> library(psych)</p>

<p>package 'psych' successfully unpacked and MD5 sums checked</p>

<p>> library(psych)<br />
> describe(OwnIQ)<br />
  var  n  mean    sd median trimmed   mad min max range skew kurtosis   se<br />
1   1 53 97.36 14.69     96      97 14.83  68 131    63 0.24    -0.47 2.02<br />
> describe(FostIQ)<br />
  var  n  mean    sd median trimmed   mad min max range  skew kurtosis   se<br />
1   1 53 98.11 15.21     97   98.21 16.31  63 132    69 -0.02     -0.5 2.09<br />
<b>Remeber from 8261: Kernel Density Plot is better than a histogram.</b><br />
-There are actually some better ways to plot now than using the base plot command: plot()<br />
> library(lattice)<br />
-the "lattice" library has a LOT of plot styles<br />
A density plot in lattice is -> densityplot(variable,Kernel="e")<br />
> densityplot(OwnIQ,Kernel="e")<br />
-all of the "lattice" graphics functions allow you to enter a formula<br />
> densityplot(~OwnIQ,data=burt,Kernel="e")<br />
> densityplot(FostIQ,Kernel="e")<br />
> densityplot(~FostIQ,data=burt,Kernel="e")<br />
-in "lattice" library histogram ->histogram()<br />
-in "lattice" library boxplot -> bwplot()<br />
> bwplot(OwnIQ)<br />
> histogram(OwnIQ)<br />
> densityplot(OwnIQ,Kernel="e")<br />
-in "lattice" scatter plot -> xyplot(formula, type="p")<br />
-formla -> X~Y where X will be plotted on X axis and Y on the Y axis<br />
> xyplot(FostIQ~OwnIQ,type="p")<br />
<h2>Five things to look for in a scatter plot</h2><br />
1. What is the direction of the relaitionship?<br />
2. What is the type of relationship? (Is it linear?)<br />
3. What is the strength of the relationship? (Are the points close or far from the line?)<br />
4. What is the magnitude of the relationship? (Line slope?)<br />
5. Are there any unusual observations? (not necessarily outlier)<br />
-In a deterministic realtionshp all data points are on the line, but there are different types of error in social science, so our plots start to look like clouds.<br />
> <br />
</p>]]>
        
    </content>
</entry>

</feed> 

