1 2.7 The Analysis of Variance (F-test) to Regression Analysis H 0 : 1 0 v.s. H a : 1 0 We have the following 2 models: y 0 Horizontal: y 0 1 x Line : yˆ y yˆ b0 b1 x Note: The object function for the model 1 is n S ( 0 ) ( yi 0 ) 2 . i 1 Thus, the estimate of the parameter 0 can be obtained by solving S ( 0 ) 0 . y is the solution. ŷ = y . 0 Fundamental Equation: n (y i 1 n n y ) ( yi yˆ i ) ( yˆ i y ) 2 2 i 2 i 1 i 1 (“distance” between data and horizontal line)= (“distance” between data and line) + (“distance” between model line and horizontal line) . ŷi y (horizontal) yi (line) n ( yˆ i 1 i (data) n (y y) 2 i 1 n (y i2 i i yˆ i ) 2 y) 2 [Derivation of Fundamental Equation]: n n n i 1 i 1 i 1 n n ( yi y ) 2 ( yi yˆ i yˆ i y ) 2 ( yi yˆ i ) 2 ( yˆ i y ) 2 2 ( yi yˆ i )( yˆ i y ) i 1 i 1 n n i 1 i 1 ( y i yˆ i ) 2 ( yˆ i y ) 2 n since (y i 1 n i yˆ i )( yˆ i y ) y i ( y b1 ( xi x )) y b1 ( xi x ) y i 1 2 n n ( y i y ) b1 ( xi x )b1 ( xi x ) b1 ( yi y ) b1 ( xi x )( xi x ) i 1 i 1 n n b1 ( yi y )( xi x ) b1 ( xi x ) 2 b1 ( s XY b1 s XX ) i 1 i 1 s b1 (s XY XY S XX ) b1 (s XY s XY ) 0 s XX The ANOVA (Analysis of Variance) table corresponding to the fundamental equation: Source df SS MS n n Due to regression 1 SSR ( yˆ i y ) 2 MSR ( yˆ i y ) 2 Residual (Error) n-2 i 1 i 1 n n SSE ( y i yˆ i ) 2 i 1 Total (corrected) MSE (y i 1 i yˆ i ) 2 n2 n n-1 SST ( y i y ) 2 i 1 Let n ( yˆ i 1 f i y)2 n ( yi yˆi ) 2 n 1 ( yˆ i 1 i y)2 s2 , i 1 n2 the ratio of the mean sum of squares due to the regression and mean residual sum of squares. Intuitively, large F value might imply the difference between the line and the horizontal line is relatively large to the random variation reflected by the mean residual sum of squares. That is, 1 is so significant such that the difference between the line and the horizontal line are apparent. Therefore, the F value can provide important information about if H 0 : 1 0 . Next question to ask: how large value of F can be considered to be large? To test H 0 : 1 0 v.s. H a : 1 0 , f f1, n 2 , reject H 0 3 Note: The sum of squares due to the regression and the mean sum of squares due to regression are MSR n SSR 1 ( yˆ i 1 i y )2 . n SST sYY ( yi y ) 2 The total sum of squares is i 1 f Thus, the f statistic is MSR . MSE Note: For ease of computation, the following equations can be used: MSR SSR b1s XY b12 s XX . Note: E MSE , E MSR E SSR 1 s XX . 2 2 2 Note: Let t be the statistic for testing H 0 : 1 0 v.s. H a : 1 0 . Then, f t2 . Motivating Example (continue): Assume 0.05 . To test H 0 : 1 0 v.s. H a : 1 0 , we have the following: b1 5, s XY 2840, SSR b1s XY 5 2840 14200, 10 yi y 2 i 1 SSE 10 y i 1 10 y i 1 i 2 i 10 y 2 184730 10 130 2 15730 y SSR 15730 14200 1530 2 Thus, we have the following ANOVA table Source df Regression 1 Residual (Error) SS MS SSR=14200 MSR n-2=8 SSE=1530 Total 9 (corrected) 15730 f SSR 14200 1 SSE 8 191.25 MSE MSR 14200 MSE 191.25 74.1 f 4 Since f 74.1 5.32 f1,8, 0.05 , we reject H 0 : 1 0 . Note that f 74.1 8.61 t 2 . 2 Example 2 (continue): Suppose the model is yi 0 1 xi i , i 1,,20, i ~ N 0, 2 , and 20 x i 1 i 20 20 i 1 i 1 1330, yi 1862.8, xi2 90662, 20 y i 1 20 2 i 173554.26, xi yi 124206.9 i 1 (a)Provide an ANOVA table. (b) Find the 95% confidence interval for 1 .and use the confidence interval to test H 0 : 1 0 . [solution:] (a) Since 2 1862.8 sYY SST y 20 y 173554.26 20 53.06 20 i 1 SSR b1 s XY 0.149 330.7 49.220 20 2 i 2 SSE SST SSR 53.06 49.220 3.848 The ANOVA table is Source df SS Residual (Error) n-2=18 SSE=3.848 Regression 1 SSR=49.220 Total (corrected) 19 53.068 MS SSE 18 0.214 MSE MSR SSR 49.220 1 5 (b) The 95% confidence interval for 1 is s2 b1 t n 2, 2 s XX 1 2 0.214 0.149 t18,0.025 2217 1 2 0.128,0.170 . Since 0 0.128,0.170 , we reject H 0 : 1 0 . Example 3: Given are 5 observations for two variables x and y. xi 2 3 5 yi 25 25 20 Suppose the model is 1 8 30 16 yi 0 1 xi i , i 1,,5, i ~ N 0, 2 (a) (b) (c) (d) , Find the least square estimate and the fitted regression equation Provide an ANOVA table and use F statistic to test H 0 : 1 0 at 0.01. Use t statistic to test H 0 : 1 1.5 at 0.01. Find the 95% confidence interval for 0 .and use the confidence interval to test H 0 : 0 30 . [solutions:] (a) Since 5 x i 1 5 i 5 5 5 i 1 i 1 i 1 19, x 103, xi y i 383, yi 116, y i2 2806, i 1 2 i thus, 2 19 x 5 x 103 5 30.8 5 i 1 5 19 116 xi yi 5 x y 383 5 57.8 5 5 i 1 5 s XX s XY 2 i 2 Then, the least square estimate is b1 s XY 57.8 116 19 1.8766, b0 y b1 x (1.8766) 30.3311 s XX 30.8 5 5 The fitted regression equation is yˆ 30.3311 1.8766 x . (b) Since 6 2 116 sYY SST y 5 y 2806 5 114.8 5 i 1 SSR b1 s XY 1.8766 57.8 108.467 5 2 i 2 SSE SST SSR 114.8 108.467 6.333 The ANOVA table is Source df SS Regression 1 SSR=108.467 Residual (Error) n-2=3 SSE=6.333 Total n-1=4 (corrected) SST=114.8 MS F SSR 1 108.467 f MSR s 2 MSE MSR MSE 51.381 SSE 2.111 3 Since f 51.381 34.12 f1,3, 0.01 , we reject H 0 : 1 0 . (c) t b1 c 1.8766 (1.5) 0.3766 1.438 . 1 1 s b1 2 2 2 2 . 111 s s 30.8 XX Since t 1.438 5.841 t 3,0.005 t n 2, 2 , we do not reject H 0 : 1 1.5 . (d) The 95% confidence interval for 0 is 2 n 2 s xi b0 t n 2, i 1 2 ns XX 1 2 2.111 103 30.3311 t 3,0.025 5 30.8 1 2 30.3311 (3.182 1.188) 26.551,34.111 . Since 30 26.551,34.111 , we do not reject H 0 : 0 30 .