R 2 , CORRLATION PREDICTION, Prediction: The fitted regression equation is : Yˆ b0 b1 X Y b1 ( X X ) . , Based on the fitted equation, the predicted value at a specified value X 0 is Yˆ0 Y b1 ( X 0 X ) E(Yˆ0 ) E(Y b1 ( X 0 X )) E(Y ) ( X 0 X ) E(b1 ) 0 1 X ( X 0 X ) 1 0 1 X 0 =The mean of Y at X 0 and Var(Yˆ0 ) Var(Y b1 ( X 0 X )) Var(Y ) Var(b1 ( X 0 X )) (Since Cov(b1 , Y ) 0 ) 2 n ( X 0 X ) Var (b1 ) 2 2 n (X 0 X ) 2 2 S XX 1 ( X 0 X )2 S XX n 2 Therefore, 1 ( X 0 X )2 ˆ s.e.(Y0 ) s S XX n 1/ 2 . Note: Var (Yˆ0 ) or s.e.(Yˆ0 ) achieve their minimum as X 0 X . That is, we might our best “prediction” in the “middle” of our average range of X ( X ) . As X 0 is far away from X , the prediction would be less accurate (since the standard error is getting large). Thus, a (1 )100% confidence interval for E(Yˆ0 ) 0 1 X 0 is Yˆ0 t n2,1 / 2 s.e.(Yˆ0 ) Yˆ0 t n2,1 / 2 s.e.(Yˆ0 ), Yˆ0 t n2,1 / 2 s.e.(Yˆ0 ) 1 R 2 , Correlation, and Regression: (a) R 2 : n R2 (Yˆ i i 1 n Y )2 (Yi Y ) 2 , i 1 is the ratio of regression sum of square and total sum of square (corrected). R 2 is also the ratio of (the distance between model 2 and model 1) and (the distance n between data (model 0) and model 1). Since the total sum of square (Y i 1 i Y ) 2 is n the sum of the regression sum of square (Yˆ Y ) i 1 n (Y i 1 i 2 i and the residual sum of square Yˆi ) 2 . Large R 2 implies the proportion of the total sum of square contributed by the regression sum of square is large. For example, if R 2 =0.9, then 90% of total sum of square comes from the regression sum of square. Heuristically, that indicates 90% of Yi .can be explained by Yˆi . That is, model 2 can fit the data well. In addition, large R 2 also implies the regression sum of square is large relative to the residual sum of square. In the above example, the regression sum of square is 9 times larger than the residual sum of square since the residual sum of square contributes 10% of total sum of square (corrected). That is, the distance between model 2 and model 1 is large relative to the variation of the data. As we explain in the previous section, this might imply the slope in the regression is significant. Thus, model 2 might be sensible. R 2 is usually recommended as a “useful first thing to look at” in a regression printout. (b) Correlation: The correlation coefficient between the covariate X and the response Y is n rXY (X i 1 n (X i 1 i i X )(Yi Y ) X) n 2 (Y i 1 i Y ) 2 S S 2 XY 1/ 2 1/ 2 XX YY S , 1 rXY 1 As Yi aX i b , then rXY 1 or rXY 1 . That is, rXY 1 implies a significant linear relationship between X and Y. The correlation coefficient is also associated with the regression coefficient b1 . S YY S S 1/ 2 S XY b1 XY YY 1/ 2 S XX S 1XX/ 2 S 1XX/ 2 S YY S XX 1/ 2 rXY . As b1 0 rXY 0 a positively linear relation. As b1 0 rXY 0 a negatively linear relation As b1 0 rXY 0 there is no significantly linear relation between X and Y. Note: rXY measures linear association between X and Y, while b1 measures the size of the change in Y due to a unit change in X. rXY is unit-free and scale-free. Scale change in the data will affect b1 but not rXY Note: the value of a correlation rXY shows only the extent to which X and Y are linearly associated. It does not by itself imply that any sort of casual relationship exists between X and Y. Such a false assumption has lead to erroneous conclusions on many occations. Note that rXY is also associated with R 2 since Yˆ Y n R2 i 1 n 2 i Y i 1 i Y 2 b1 S XY S YY S XY S 2 S XX XY S XY S YY S XX S YY and rXY S XY ( sign.of .b1 ) R 2 1/ 2 1/ 2 S XX SYY 1/ 2 ( sign.of .b1 ) R , where R R2 1/ 2 and rXY has the same sign as b1 . The above equation indicates that large R 2 implies strong correlation between the response and the covariate. Note: rXY (sign of b1 ) R only holds for the simple linear regression Y 0 1 X . 3 The correlation between the response Y and the fitted value Yˆ is n rYYˆ (Y Y )(Yˆ Yˆ ) i 1 i n i n R , where Yˆ n (Yi Y ) (Yˆi Yˆ ) 2 Yˆ i 1 n 2 i 1 i . i 1 The derivation of rYYˆ : Since n Yˆ Yˆi i 1 n n (b 0 i 1 b1 X i ) n b0 b1 X , 2 (Yˆi Yˆ ) 2 b0 b1 X i (b0 b1 X ) b12 ( X i X ) 2 b12 S XX , n n n i 1 i 1 i 1 and n (Y i 1 i Y )(Yˆi Yˆ ) (Yi Y )b0 b1 X i (b0 b1 X ) (Yi Y )b1 ( X i X ) n n i 1 i 1 n b1 (Yi Y )( X i X ) b1 S XY , i 1 thus rYYˆ S 1/ 2 YY b1 S XY b S 2 11 / 2 1 / 2XY1 / 2 ( sign.of .b1 )rXY (sign.of .b1 ) 2 R R . 2 1/ 2 (b1 S XX ) (b1 ) SYY S XX The equation rYYˆ R implies large value of R 2 also implies the significantly positively linear relation between the observation Yi and the predicted values Yˆi . In other word, the prediction of Yi is not unrelevant to Yi . Note: rYYˆ R holds not only for the simple linear regression Y 0 1 X , but also for the multiple linear regression!! 4