First practice in R August 27, 2013 Let’s look at a simple data set from your first regression class here at Penn. First grab the file: http://www-stat.wharton.upenn.edu/~waterman/fsw/datasets/txt/Cleaning.txt Start R. First read in the file: > read.table("Cleaning.txt") Oops, that generates too much output, and doesn’t put it anywhere. So let’s assign all this mess to a data frame. > clean = read.table("Cleaning.txt") Just look at what we have by typing “clean” again. Oops–we have the first row with the names of the variables in it. So let’s try again: > clean = read.table("Cleaning.txt", header = TRUE) NOTE: We could have avoided some pain here if we had read the documentation ahead of time. A quick reminder of how to use a function is to use “?read.table”. But in general, the text book / the web / google provides better documentation. Checking with “clean” shows we only have numbers. How happy can you get?!? Now for the fun part, let’s run a regression. > lm(clean$RoomsClean ~ clean$NumberOfCrews) Call: lm(formula = clean$RoomsClean ~ clean$NumberOfCrews) Coefficients: (Intercept) 1.785 clean$NumberOfCrews 3.701 Kinda a different world view than JMP. It just gives the minimal amount of output possible. So to see a bit more, try > summary(lm(clean$RoomsClean ~ clean$NumberOfCrews)) 1 Call: lm(formula = clean$RoomsClean ~ clean$NumberOfCrews) Residuals: Min 1Q -15.9990 -4.9901 Median 0.8046 3Q 4.0010 Max 17.0010 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.7847 2.0965 0.851 0.399 clean$NumberOfCrews 3.7009 0.2118 17.472 <2e-16 *** --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 7.336 on 51 degrees of freedom Multiple R-squared: 0.8569, Adjusted R-squared: 0.854 F-statistic: 305.3 on 1 and 51 DF, p-value: < 2.2e-16 That should look very similar to other tables you have seen. But what of pictures? Well, let’s do a plot: > plot(lm(clean$RoomsClean ~ clean$NumberOfCrews)) 20 Residuals vs Fitted 46 ● ● 31 ● 10 ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −10 ● ● ● ● ● 5● −20 Residuals ● ● 10 20 30 40 50 60 Fitted values lm(clean$RoomsClean ~ clean$NumberOfCrews) Now to prove you have gotten this far submit the following: 2 1. What is the R-squared? 2. Make a plot of the residuals vs X. Now put both of these into a word file and then convert it to a PDF file. Email this pdf file to us at statistics.assignments@gmail.com. 3