Statistics 101: Section L - Laboratory 5 Take Home Laboratory 5 is a take home laboratory consisting three activities. It is due at the end of lab on Tuesday, February 17. For the first two activities, you will be looking at some computer simulations involving correlation and regression. These simulations focus on understanding how least squares regression lines fit through a set of points and gauge your ability to estimate the value of the correlation coefficient from a scatter plot. The third activity involves using JMP to fit the least squares regression line to the forearm length and foot length data collected in Lab 2. 1. Simulation 1: Putting Points Go to the Stat 101L course webpage http://www.public.iastate.edu/∼wrstephe/stat101L.html Under Computing click on the Putting points and regression lines link. There are brief instructions on how to use this simulation at the top of the screen. In the center of the screen is a blue area where you can Put Points by clicking on the left mouse button. The correlation coefficient and least squares regression line are calculated and displayed in the margin above this blue area. (a) Put 10 points on the graph that illustrate a strong positive linear relationship between two variables. What are the correlation and the equation of the least squares regression line? You only need to report the coefficients up to 3 decimal places. Click on the box Regression line to put the least squares regression line on the graph. (b) Ask for a residual plot by clicking on Residuals. What does the residual plot, the one on the left, look like? What does this tell you about the adequacy of the least squares fit? You do not have to comment on the histogram of residuals, the plot on the right. (c) Now put a point on the graph in the bottom middle of the blue area. What is the correlation now and how does it compare to the previous one? What is the new regression equation and how has it changed? What effect did adding this point have on the residual plot? The additional point should have a large residual. Such a point (with a large residual) is called an outlier in regression. (d) Clear the plot. Put 10 points on the graph that have a curved relationship, not linear. Looking at the regression line on the plot and the plot of residuals, how can you tell the points do not have a linear relationship? (e) Clear the plot. In the lower-left hand corner of the blue area, put points on the graph that illustrate a strong negative linear relationship. In the upper right hand corner of the blue area, put a single point. Where is the regression line? What does the residual plot look like? If the point in the upper right hand corner an outlier in regression? Now remove the point in the upper right hand corner (you can do this by clicking on Erase Point and clicking on the point to be removed). What do the regression line and residual plot look like now? Note: points like the one you just removed are called influential points - they cause noticeable changes in the slope or intercept, or both, when they are added or removed. (f) Clear the plot. Create plots (use about 25 points) with correlations close to the following: 0.70, 0.90, -0.30. You can try to do this by individually adding the 25 points or let the program help you by putting 25 in the # of points box and a value for the correlation in the Target correlation box. Then click on Random points. Be sure to clear the plot before resetting the next target correlation. No need to give any answer for this part, this is just getting you ready for simulation 2. 1 2. Simulation 2: Guessing correlations Go back to the Web address given above but this time click on the Guessing the value of the correlation coefficient link under Computing. Ask for New plots and match the correlation coefficients to the plots. Do this for 5 rounds (more if you are on a roll, a streak of correct answers will get your name on the list of top scores). This should help you see how the different values for the correlation coefficient correspond to scatterplots of data. 3. Regression and JMP In laboratory 4, we looked at a “visual regression” line for predicting foot length from forearm length. One drawback of “visual regression” is that everyone sees the plot a little differently and so there is no consensus on one line that could be used to describe the general relationship between forearm length and foot length. We will now use JMP to calculate the least squares regression line for predicting foot length from forearm length. (a) Go to the course website http://www.public.iastate.edu/∼wrstephe/stat101L.html. Under the Computing heading, click on the link for Forearm Length and Foot Length. Follow the instructions for downloading a JMP file (if you are using Internet Explorer) or TXT file (if you are using Netscape). (b) To calculate the least squares regression line, select Analyze → Fit Y by X from the JMP menu. Select the column foot and click on the button Y, Response. Then select the column arm and click on the button X, Factor. Press OK. (c) You will now have a scatterplot of the two variables, arm on the x-axis and foot on the y-axis. To add the regression line to the output, click on the inverted red triangle next to the words Bivariate Fit of foot By arm and select Fit Line. You should now have a regression line on your scatterplot and regression statistics below. (d) Click on the inverted red triangle next to the words Linear Fit and select Plot Residuals. You should now have a residual plot added to the bottom of the window. (e) From the JMP menu, select File → Print. Use this output to answer the following questions. i. Give the intercept and slope for your least squares regression. ii. Give an interpretation of each value, within the context of the problem. iii. Give the least squares regression equation for predicting foot length from forearm length. iv. Find the predicted foot length for someone with a forearm length of 26 cm. v. Three people in the class had a forearm length of 26 cm. Calculate a residual for each of these people. vi. Describe the residual plot. What does this residual plot tell you about the fit of the regression line? vii. Give the value of R 2 for this regression. Give an interpretation of this value. viii. Based on the data, do you think a linear regression is a good approximation for the relationship between these two variables? Explain your answer. 2 Statistics 101 Laboratory 5 Take Home Lab Answer Sheet Name: 1. Simulation: Putting Points (a) 10 points with strong positive relationship: Correlation: Least Squares Regression Equation: (b) Residual Plot Describe the plot: Is the linear relationship adequate? Any outliers? Will prediction be bad in any areas? (i.e. megaphone shape?) (c) Add another point to the graph (bottom middle of blue area): What is the new correlation? How does it compare to the previous correlation? Least squares regression equation: How did the regression line change? What happened to the residual plot? Is the point you added an outlier? (d) 10 points that have a non-linear relationship: Correlation: Least squares regression equation: In looking at the residual plot, what do you notice or how can you tell the relationship is not linear? 3 (e) 10 points that have a strong negative linear relationship and one point in the upper right corner: Correlation: Least squares regression equation: What does the residual plot look like? Is the point in the upper right an outlier? Erase the point: New Correlation: New regression equation: Did the regression line change? Is the point that you removed an influential point? 2. Simulation: Guessing correlations (a) (b) (c) (d) (e) Score Score Score Score Score on on on on on the the the the the first round: second round: third round: fourth round: fifth round: 3. Regression using JMP (a) Intercept and slope or the least squares regression line. (b) Interpretation of intercept and slope within the context of the problem. 4 (c) Least Squares Regression Equation (d) Prediction of foot length for a forearm length of 26 cm. (e) Calculating Residuals. (f) Residual Plot (g) R2 and interpretation (h) Is linear regression OK for relationship? 5