First practice in R

advertisement
First practice in R
August 27, 2013
Let’s look at a simple data set from your first regression class here at Penn.
First grab the file:
http://www-stat.wharton.upenn.edu/~waterman/fsw/datasets/txt/Cleaning.txt
Start R. First read in the file:
> read.table("Cleaning.txt")
Oops, that generates too much output, and doesn’t put it anywhere. So let’s
assign all this mess to a data frame.
> clean = read.table("Cleaning.txt")
Just look at what we have by typing “clean” again. Oops–we have the first row
with the names of the variables in it. So let’s try again:
> clean = read.table("Cleaning.txt", header = TRUE)
NOTE: We could have avoided some pain here if we had read the documentation
ahead of time. A quick reminder of how to use a function is to use “?read.table”.
But in general, the text book / the web / google provides better documentation.
Checking with “clean” shows we only have numbers. How happy can you
get?!? Now for the fun part, let’s run a regression.
> lm(clean$RoomsClean ~ clean$NumberOfCrews)
Call:
lm(formula = clean$RoomsClean ~ clean$NumberOfCrews)
Coefficients:
(Intercept)
1.785
clean$NumberOfCrews
3.701
Kinda a different world view than JMP. It just gives the minimal amount of
output possible. So to see a bit more, try
> summary(lm(clean$RoomsClean ~ clean$NumberOfCrews))
1
Call:
lm(formula = clean$RoomsClean ~ clean$NumberOfCrews)
Residuals:
Min
1Q
-15.9990 -4.9901
Median
0.8046
3Q
4.0010
Max
17.0010
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
1.7847
2.0965
0.851
0.399
clean$NumberOfCrews
3.7009
0.2118 17.472
<2e-16 ***
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.336 on 51 degrees of freedom
Multiple R-squared: 0.8569,
Adjusted R-squared: 0.854
F-statistic: 305.3 on 1 and 51 DF, p-value: < 2.2e-16
That should look very similar to other tables you have seen. But what of
pictures? Well, let’s do a plot:
> plot(lm(clean$RoomsClean ~ clean$NumberOfCrews))
20
Residuals vs Fitted
46 ●
● 31
●
10
●
0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−10
●
●
●
●
●
5●
−20
Residuals
●
●
10
20
30
40
50
60
Fitted values
lm(clean$RoomsClean ~ clean$NumberOfCrews)
Now to prove you have gotten this far submit the following:
2
1. What is the R-squared?
2. Make a plot of the residuals vs X.
Now put both of these into a word file and then convert it to a PDF file. Email
this pdf file to us at statistics.assignments@gmail.com.
3
Download