Regression – Residuals – Why? Suppose that you want to build a square deck for your backyard. A builder has given you the following estimates: Width (in feet) 4 6 8 10 12 14 16 18 Cost (in dollars) 150 400 600 900 1400 2000 2500 3000 1. Use technology to construct a scatter plot of these data. Comment on the appropriateness of linear regression based on your scatter plot. 2. Use technology to determine the regression line for predicting the cost of the deck from the width of the deck. Add the regression line to your scatter plot. Comment on the fit of your regression line to your data. 3. What is the correlation coefficient for the relationship between cost and width of the deck? Based on your correlation coefficient and the appearance of the fit of your regression line with the data, does it appear that linear regression is appropriate for this data set? Why or why not? 4. Suppose that you want to build a square deck with width 10.5 feet. What is the estimated cost for the deck? Does the cost you estimated seem reasonable with respect to the data? About how much will your prediction be off? 5. Use technology to construct a scatter plot of the residuals versus the width of the deck. What does a residual plot tell us about our linear regression? Answer Key Regression – Residuals – Why? Note to instructors: Here I have provided the answers that I think students will provide. Some students may be more (or less) concerned about the slight curve to the data than the answers indicate. 1. Use technology to construct a scatter plot of these data. Comment on the appropriateness of linear regression based on your scatter plot. The following is a scatter plot of the data: 3000 Cost 2000 1000 0 4 9 14 19 Width The data look fairly linear, although there might be a slight curve in the middle. Overall, it looks like linear regression is appropriate for these data. 2. Use technology to determine the regression line for predicting the cost of the deck from the width of the deck. Add the regression line to your scatter plot. Comment on the fit of your regression line to your data. The following is the regression analysis using Minitab: Regression Analysis The regression equation is Cost = - 933 + 209 Width Predictor Constant Width Coef -932.7 209.23 s = 191.7 Stdev 176.2 14.79 R-sq = 97.1% t-ratio -5.29 14.15 p 0.002 0.000 R-sq(adj) = 96.6% Analysis of Variance SOURCE Regression Error Total DF 1 6 7 SS 7354300 220387 7574688 MS 7354300 36731 F 200.22 p 0.000 The regression equation is: predicted cost = -933 + 209*width The following is a scatter plot of the data with the linear regression line added in: Regression Plot 3500 Cost 2500 1500 Y = -932.738 + 209.226X R-Squared = 0.971 500 -500 4 9 14 19 Width The regression line seems to fit the data fairly well, except for the area where there is a slight curve in the data. 3. What is the correlation coefficient for the relationship between cost and width of the deck? Based on your correlation coefficient and the appearance of the fit of your regression line with the data, does it appear that linear regression is appropriate for this data set? Why or why not? The correlation coefficient for cost and width of the deck is 0.985. This correlation coefficient indicates a very strong positive linear relationship between cost and width of the deck. Based on the correlation coefficient and the fit of the linear regression line, linear regression does seem appropriate for these data. 4. Suppose that you want to build a square deck with width 10.5 feet. What is the estimated cost for the deck? Does the cost you estimated seem reasonable with respect to the data? About how much will your prediction be off? The estimated cost for the deck would be -933 + 209*10.5 = -933 + 2194.5 = $1261.50. This cost seems reasonable based on the data since it falls between the $900 for a deck that is 10 feet in width and the $1400 for a deck that is 12 feet in width. Our prediction error for our estimated cost is $191.70. 5. Use technology to construct a scatter plot of the residuals versus the width of the deck. What does a residual plot tell us about our linear regression? The following are residual plots from Minitab: Plot of residuals versus width I Chart of Residuals Normal Plot of Residuals 300 400 300 Residual Residual 200 100 0 -100 U C L =3 5 3 .9 200 100 0 -100 X =0 .0 0 0 -200 -300 -400 -200 -300 -1.5 -0.5 0.5 1.5 L C L =-3 5 3 .9 1 Normal Score 2 3 4 5 6 7 8 Observ ation Number Histogram of Residuals Residuals vs. Fits 300 3 Residual Frequency 200 2 1 100 0 -100 -200 0 -300 -200 0 Residual 200 4 9 14 19 Fit Notice that the slight curve in the data is magnified in the residual plots above. Based on these diagnostic measures, linear regression might not be appropriate for these data.