Regression – Residuals – Why

advertisement
Regression – Residuals – Why?
Suppose that you want to build a square deck for your backyard. A builder has given you the
following estimates:
Width (in feet)
4
6
8
10
12
14
16
18
Cost (in dollars) 150 400 600 900 1400 2000 2500 3000
1. Use technology to construct a scatter plot of these data. Comment on the appropriateness of
linear regression based on your scatter plot.
2. Use technology to determine the regression line for predicting the cost of the deck from the
width of the deck. Add the regression line to your scatter plot. Comment on the fit of your
regression line to your data.
3. What is the correlation coefficient for the relationship between cost and width of the deck?
Based on your correlation coefficient and the appearance of the fit of your regression line
with the data, does it appear that linear regression is appropriate for this data set? Why or
why not?
4. Suppose that you want to build a square deck with width 10.5 feet. What is the estimated
cost for the deck? Does the cost you estimated seem reasonable with respect to the data?
About how much will your prediction be off?
5. Use technology to construct a scatter plot of the residuals versus the width of the deck. What
does a residual plot tell us about our linear regression?
Answer Key
Regression – Residuals – Why?
Note to instructors: Here I have provided the answers that I think students will provide. Some
students may be more (or less) concerned about the slight curve to the data than the answers
indicate.
1. Use technology to construct a scatter plot of these data. Comment on the appropriateness of
linear regression based on your scatter plot.
The following is a scatter plot of the data:
3000
Cost
2000
1000
0
4
9
14
19
Width
The data look fairly linear, although there might be a slight curve in the middle. Overall, it
looks like linear regression is appropriate for these data.
2. Use technology to determine the regression line for predicting the cost of the deck from the
width of the deck. Add the regression line to your scatter plot. Comment on the fit of your
regression line to your data.
The following is the regression analysis using Minitab:
Regression Analysis
The regression equation is
Cost = - 933 + 209 Width
Predictor
Constant
Width
Coef
-932.7
209.23
s = 191.7
Stdev
176.2
14.79
R-sq = 97.1%
t-ratio
-5.29
14.15
p
0.002
0.000
R-sq(adj) = 96.6%
Analysis of Variance
SOURCE
Regression
Error
Total
DF
1
6
7
SS
7354300
220387
7574688
MS
7354300
36731
F
200.22
p
0.000
The regression equation is: predicted cost = -933 + 209*width
The following is a scatter plot of the data with the linear regression line added in:
Regression Plot
3500
Cost
2500
1500
Y = -932.738 + 209.226X
R-Squared = 0.971
500
-500
4
9
14
19
Width
The regression line seems to fit the data fairly well, except for the area where there is a slight
curve in the data.
3. What is the correlation coefficient for the relationship between cost and width of the deck?
Based on your correlation coefficient and the appearance of the fit of your regression line
with the data, does it appear that linear regression is appropriate for this data set? Why or
why not?
The correlation coefficient for cost and width of the deck is 0.985. This correlation
coefficient indicates a very strong positive linear relationship between cost and width of the
deck. Based on the correlation coefficient and the fit of the linear regression line, linear
regression does seem appropriate for these data.
4. Suppose that you want to build a square deck with width 10.5 feet. What is the estimated
cost for the deck? Does the cost you estimated seem reasonable with respect to the data?
About how much will your prediction be off?
The estimated cost for the deck would be -933 + 209*10.5 = -933 + 2194.5 = $1261.50. This
cost seems reasonable based on the data since it falls between the $900 for a deck that is 10
feet in width and the $1400 for a deck that is 12 feet in width. Our prediction error for our
estimated cost is $191.70.
5. Use technology to construct a scatter plot of the residuals versus the width of the deck. What
does a residual plot tell us about our linear regression?
The following are residual plots from Minitab:
Plot of residuals versus width
I Chart of Residuals
Normal Plot of Residuals
300
400
300
Residual
Residual
200
100
0
-100
U C L =3 5 3 .9
200
100
0
-100
X =0 .0 0 0
-200
-300
-400
-200
-300
-1.5
-0.5
0.5
1.5
L C L =-3 5 3 .9
1
Normal Score
2
3
4
5
6
7
8
Observ ation Number
Histogram of Residuals
Residuals vs. Fits
300
3
Residual
Frequency
200
2
1
100
0
-100
-200
0
-300
-200
0
Residual
200
4
9
14
19
Fit
Notice that the slight curve in the data is magnified in the residual plots above. Based on these
diagnostic measures, linear regression might not be appropriate for these data.
Download