1.6 Prediction and correlation

advertisement
R 2 , CORRLATION
PREDICTION,
Prediction:
The fitted regression equation is : Yˆ  b0  b1 X  Y  b1 ( X  X ) . ,
Based on the fitted equation, the predicted value at a specified value X 0 is
Yˆ0  Y  b1 ( X 0  X )
E(Yˆ0 )  E(Y  b1 ( X 0  X ))  E(Y )  ( X 0  X ) E(b1 )   0  1 X  ( X 0  X ) 1
  0  1 X 0 =The mean of Y at X 0
and
Var(Yˆ0 )  Var(Y  b1 ( X 0  X ))  Var(Y )  Var(b1 ( X 0  X )) (Since Cov(b1 , Y )  0 )

2
n
 ( X 0  X ) Var (b1 ) 
2
2
n
 (X 0  X )
2
2
S XX
1 ( X 0  X )2 
  

S XX
n

2
Therefore,
1 ( X 0  X )2 
ˆ
s.e.(Y0 )  s  

S XX
n

1/ 2
.
Note: Var (Yˆ0 ) or s.e.(Yˆ0 ) achieve their minimum as X 0  X . That is, we might
our best “prediction” in the “middle” of our average range of X ( X ) . As X 0 is far
away from X , the prediction would be less accurate (since the standard error is
getting large).
Thus, a (1   )100% confidence interval for E(Yˆ0 )   0  1 X 0 is

Yˆ0  t n2,1 / 2 s.e.(Yˆ0 )  Yˆ0  t n2,1 / 2 s.e.(Yˆ0 ), Yˆ0  t n2,1 / 2 s.e.(Yˆ0 )
1

R 2 , Correlation, and Regression:
(a) R 2 :
n
R2 
 (Yˆ
i
i 1
n
 Y )2
 (Yi  Y ) 2
,
i 1
is the ratio of regression sum of square and total sum of square (corrected). R 2 is
also the ratio of (the distance between model 2 and model 1) and (the distance
n
between data (model 0) and model 1). Since the total sum of square
 (Y
i 1
i
 Y ) 2 is
n
the sum of the regression sum of square
 (Yˆ  Y )
i 1
n
 (Y
i 1
i
2
i
and the residual sum of square
 Yˆi ) 2 . Large R 2 implies the proportion of the total sum of square contributed
by the regression sum of square is large. For example, if R 2 =0.9, then 90% of total
sum of square comes from the regression sum of square. Heuristically, that indicates
90% of Yi .can be explained by Yˆi . That is, model 2 can fit the data well. In
addition, large R 2 also implies the regression sum of square is large relative to the
residual sum of square. In the above example, the regression sum of square is 9 times
larger than the residual sum of square since the residual sum of square contributes
10% of total sum of square (corrected). That is, the distance between model 2 and
model 1 is large relative to the variation of the data. As we explain in the previous
section, this might imply the slope in the regression is significant. Thus, model 2
might be sensible. R 2 is usually recommended as a “useful first thing to look at”
in a regression printout.
(b) Correlation:
The correlation coefficient between the covariate X and the response Y is
n
rXY 
(X
i 1
n
(X
i 1
i
i
 X )(Yi  Y )
 X)
n
2
 (Y
i 1
i
Y )

2
S
S
2
XY
1/ 2 1/ 2
XX YY
S
,
 1  rXY  1
As Yi  aX i  b , then rXY  1 or rXY  1 . That is, rXY  1 implies a significant
linear relationship between X and Y. The correlation coefficient is also associated with
the regression coefficient b1 .
 S YY
S
S 1/ 2
S XY

b1  XY  YY


1/ 2
S XX
S 1XX/ 2 S 1XX/ 2 S YY
 S XX



1/ 2
rXY .
As b1  0  rXY  0  a positively linear relation.
As b1  0  rXY  0  a negatively linear relation
As b1  0  rXY  0  there is no significantly linear relation between X and Y.
Note: rXY measures linear association between X and Y, while b1 measures the
size of the change in Y due to a unit change in X. rXY is unit-free and scale-free.
Scale change in the data will affect b1 but not rXY
Note: the value of a correlation rXY shows only the extent to which X and Y are
linearly associated. It does not by itself imply that any sort of casual relationship
exists between X and Y. Such a false assumption has lead to erroneous conclusions
on many occations.
Note that rXY is also associated with R 2 since
 Yˆ  Y 
n
R2 
i 1
n
2
i
 Y
i 1
i
Y 
2

b1 S XY
S YY
 S XY
S

2
S XX  XY
S XY



S YY
S XX S YY
and
rXY 
 
S XY
 ( sign.of .b1 ) R 2
1/ 2 1/ 2
S XX SYY
1/ 2
 ( sign.of .b1 ) R ,
where
 
R  R2
1/ 2
and rXY has the same sign as b1 .
The above equation indicates that large R 2 implies strong correlation between the
response and the covariate.
Note: rXY  (sign of b1 ) R only holds for the simple linear regression
Y   0  1 X   .
3
The correlation between the response Y and the fitted value Yˆ is
n
rYYˆ 
 (Y  Y )(Yˆ  Yˆ )
i 1
i
n
i
n
 R , where Yˆ 
n
 (Yi  Y )  (Yˆi  Yˆ ) 2
 Yˆ
i 1
n
2
i 1
i
.
i 1
The derivation of rYYˆ :
Since
n
Yˆ 
 Yˆi
i 1
n
n

 (b
0
i 1
 b1 X i )
n
 b0  b1 X ,
2
 (Yˆi  Yˆ ) 2   b0  b1 X i  (b0  b1 X )   b12 ( X i  X ) 2  b12 S XX ,
n
n
n
i 1
i 1
i 1
and
n
 (Y
i 1
i
 Y )(Yˆi  Yˆ )   (Yi  Y )b0  b1 X i  (b0  b1 X )   (Yi  Y )b1 ( X i  X )
n
n
i 1
i 1
n
 b1  (Yi  Y )( X i  X )  b1 S XY ,
i 1
thus
rYYˆ 
S
1/ 2
YY
b1 S XY
b
S
 2 11 / 2  1 / 2XY1 / 2  ( sign.of .b1 )rXY  (sign.of .b1 ) 2 R  R .
2
1/ 2
(b1 S XX )
(b1 )
SYY S XX
The equation rYYˆ  R implies large value of R 2 also implies the significantly
positively linear relation between the observation Yi and the predicted values Yˆi . In
other word, the prediction of Yi is not unrelevant to Yi .
Note:
rYYˆ  R holds not only for the simple linear regression Y   0  1 X   ,
but also for the multiple linear regression!!
4
Download