Math 58B - Introduction to Biostatistics Spring 2015 Jo Hardin Lab Assignment 12 Lab Goals: 1. To understand the concept of a null sampling distribution for the slope of a regression line. 2. To be able to run & interpret a correlation in R. 3. To be able to run & interpret a regression line in R. Note 1: if you can't remember what an ISCAM function does, pass it the argument "?". Also, remember to look at the back of each chapter for R summaries. load(url("http://www.rossmanchance.com/iscam2/ISCAM.RData")) Note 2: Though, for this lab, you'll prinarily use the applet, for running linear models in R, use the function lm. help(lm) # page 326 ## starting httpd help server ... done install.packages("car", repos="http://cran.us.r-project.org") ## Installing package into '\\wells/fac-staff$/jsh04747/My Documents/R/winlibrary/3.1' ## (as 'lib' is unspecified) ## package 'car' successfully unpacked and MD5 sums checked ## ## The downloaded binary packages are in ## C:\Users\jsh04747\AppData\Local\Temp\RtmpieQbEL\downloaded_packages library(car) ## Warning: package 'car' was built under R version 3.1.3 In class In this lab, we will investigate a new setting: inference for simple linear regression. Go through Investigation 4.10 & 4.11. To remove the 47th row from a dataset called "mydata": datanew = mydata[-47,] To turn in Consider the cat data from Investigation 4.6. Use R to create the scatterplot as well as to find the correlation and least squares regression line. .1. Create a scatterplot with the explanatory variable (you choose!) on the x-axis, and the response variable on the y-axis. Be sure to have your axes labeled. plot(explanatory, response, xlab="my x label", ylab="my y label") .2. Superimpose the regression line onto your scatterplot. plot(explanatory, response) abline(lm(response~explanatory)) .3. Calculate and interpret correlation coefficient (sign, strength, linearity). cor(explanatory, response) .4. Square the correlation coefficient to obtain 𝑟 2 . Interpret the coefficient of determination in context (i.e., using words like "square feet" and "price"). .5. Determine and interpret the slope of the least squares line in context. The interpretation should be of the form: for every additional ____ we estimate that ____ changes by ____. lm(response ~ explanatory) summary(lm(response ~ explanatory)) .6. Give a confidence interval for the slope. Interpret the confidence interval. Based only on the confidence interval, is your slope statistically significant? Explain how you know. (Note that I haven't been explicit about the hypothesis test, but you should know the implicit hypothesis test to which I'm referring.) .7. Determine and interpret the intercept of the least squares regression line. Explain what this value might signify in this context. Is the interpretation meaningful in this context? Explain. .8. Going back to (u) of 4.10, give the intuition as to why each of the observations in the box makes sense. That is, why does the sampling variability of b1 change (in the appropriate direction) for each of n, 𝜎, and 𝑠𝑥 ? .9. Describe to someone who has taken the first 3/4 of Math 58B the differences between the sampling distributions in Investigations 4.10 and 4.11.