Understanding r

advertisement
Understanding r-squared
Page 1 of 3
Goal: To understand how r2, the coefficient of determination, describes the strength of a linear
model. As Rossman points out, r2 “measures how closely the points fall to the least squares line and
thus also provides an indication of how confident one can be of predictions made with the line”1 Or
to paraphrase Moore “When you report a regression, give r2 as a measure of how successful the
model is explaining the response [for a given explanatory value].2
Goal of Modeling
1. To understand r2 we need to ask the following question.
If we did not know any modeling techniques such as regression what would
the best model be?
The answer is that, lacking any sophisticated techniques, our best model is the horizontal line
containing the mean of the response values, i.e. y = y .
2. Let’s look at an example. Assume the data from a bivariate experiment are the points shown
below.3 On your calculator, draw a scatterplot of the data and a horizontal line representing y .
Your plot should look similar to the picture below.
(5,15)
(4,8)
y =7
(3,6)
(2,4)
(1,2)
a. Is there error in this model?
b. Draw a segment from each data point showing the error with respect to the model y = y .
c. Fill in the following chart:
error with respect to
the model y = 7
Square of the errors
from the model y = 7
(1,2)
yi  y
(2,4)
(3,6)
(4,8)
(5,15)
(yi  y )2
d. Draw shaded “squares” on your plot to represent the squared values just computed (note that
since the scales of the axes are not the same, the “squares” will look like rectangles). Some
may overlap.
e.
 (yi  y )2 =
.
f. Write a general phrase describing the quantity you just computed.
1
Workshop Statistics. Allen Rossman. page 139
Basic Practice of Statistics. David S. Moore. p 127
3 Data set from Gretchen Davis at Santa Monica HS via Eric Mulfinger, Westridge School, Pasadena, CA
2
Understanding r-squared
Page 2 of 3
3. So what is the goal of our modeling efforts?
The goal of our modeling effort is to find a better model than the mean of the response variable.
We certainly would want the sum of the squares of the errors from the new model to be less than
that of the model using the mean of the response variable.
A Better Model
1. So let’s look for a better model. How about a Least Squares Regression Line (LSR line)? Using
the same data as before, add the graph of the LSR line to your plot. Your plot should look like
the picture below.
(5,15)
yˆ  3x  2
(4,8)
y =7
(3,6)
(2,4)
(1,2)
2. Is there still error in this new model? Comparing the two models, which appears to have lower
error?
3. Draw a segment from each data point showing the error to the model yˆ  3x  2 .
4. What does the “hat” symbol mean on yˆ ?
5. Fill in the following chart:
error with respect to
the model yˆ  3x  2
Square of the errors
with respect to the
model yˆ  3x  2
yi  yˆ
(1,2)
(2,4)
(3,6)
(4,8)
(5,15)
(yi  yˆ )2
6. Draw shaded “squares” on your plot to represent the values just computed. Does the sum of the
areas of these squares seem smaller than those from the model using the response mean?
7.
 (yi  yˆ )2 =
.
8. Write a general phrase describing the quantity you just computed.
Understanding r-squared
9. You have determined that
suggests what conclusion?
 (yi  y )2
= 100 and
Page 3 of 3
 (yi  yˆ )2 = 10. This information
Comparing Models
1. The natural next step is to find a number which gives us a sense how our new model compares
with the model of the mean of the response variable. Of course we would like this number to be
r2.
 (y i  yˆ )2
 (yi  y )2
=
.
The answer should be 0.1 .
computed?
Which of the following correctly describes the proportion you just
a. The proportion of how much error there is in the new model with respect to the error in the
mean model.
b. The proportion of how well the new model fits the data.
2. The correct answer to the previous question was The proportion of how much error there is the
new model with respect to the error in the mean model. Remembering that we want r2 to show
us how well our model measures how closely the observed values fall to the least squares line.
How would we compute r2 from the proportion of error in the model? See footnote 4 for a hint.
1
3. So r2 = 1 – 10 = 0.9. Check this against the r2 your calculator computed.
So r
2
(y i  yˆ )2

 1
 (yi  y ) 2
In other words, you find r2 by finding:
1–
the sum of the squares of the error of the observed data with respect to the model
the sum of the squares of the error of the observed data with respect to the mean of the response variable
- end-
4
What is the correct value of r-squared? See your calculator.
Download