2.7 The analysis of variance to regression analysis

1 2.7 The Analysis of Variance (F-test) to Regression Analysis H 0 : 1  0 v.s. H a : 1  0 We have the following 2 models: y  0   Horizontal: y   0  1 x   Line :  yˆ  y  yˆ  b0  b1 x Note: The object function for the model 1 is n S (  0 )   ( yi   0 ) 2 . i 1 Thus, the estimate of the parameter  0 can be obtained by solving S (  0 )  0 . y is the solution. ŷ = y .  0 Fundamental Equation: n (y i 1 n n  y )   ( yi  yˆ i )   ( yˆ i  y ) 2 2 i 2 i 1 i 1  (“distance” between data and horizontal line)= (“distance” between data and line) + (“distance” between model line and horizontal line) . ŷi y (horizontal) yi (line) n  ( yˆ i 1 i (data) n (y  y) 2 i 1 n (y i2 i i  yˆ i ) 2  y) 2 [Derivation of Fundamental Equation]: n n n i 1 i 1 i 1 n n  ( yi  y ) 2   ( yi  yˆ i  yˆ i  y ) 2   ( yi  yˆ i ) 2   ( yˆ i  y ) 2  2 ( yi  yˆ i )( yˆ i  y ) i 1 i 1 n n i 1 i 1   ( y i  yˆ i ) 2   ( yˆ i  y ) 2 n since (y i 1 n i  yˆ i )( yˆ i  y )    y i  ( y  b1 ( xi  x ))  y  b1 ( xi  x )  y  i 1 2 n n    ( y i  y )  b1 ( xi  x )b1 ( xi  x )  b1  ( yi  y )  b1 ( xi  x )( xi  x ) i 1  i 1  n n    b1  ( yi  y )( xi  x )  b1  ( xi  x ) 2   b1 ( s XY  b1 s XX ) i 1  i 1  s  b1 (s XY  XY S XX )  b1 (s XY  s XY )  0 s XX The ANOVA (Analysis of Variance) table corresponding to the fundamental equation: Source df SS MS n n Due to regression 1 SSR   ( yˆ i  y ) 2 MSR   ( yˆ i  y ) 2 Residual (Error) n-2 i 1 i 1 n n SSE   ( y i  yˆ i ) 2 i 1 Total (corrected) MSE  (y i 1 i  yˆ i ) 2 n2 n n-1 SST   ( y i  y ) 2 i 1 Let n  ( yˆ i 1 f  i  y)2 n  ( yi  yˆi ) 2 n 1   ( yˆ i 1 i  y)2 s2 , i 1 n2 the ratio of the mean sum of squares due to the regression and mean residual sum of squares. Intuitively, large F value might imply the difference between the line and the horizontal line is relatively large to the random variation reflected by the mean residual sum of squares. That is,  1 is so significant such that the difference between the line and the horizontal line are apparent. Therefore, the F value can provide important information about if H 0 : 1  0 . Next question to ask: how large value of F can be considered to be large? To test H 0 : 1  0 v.s. H a : 1  0 , f  f1, n  2 ,  reject H 0 3 Note: The sum of squares due to the regression and the mean sum of squares due to regression are MSR  n SSR  1  ( yˆ i 1 i  y )2 . n SST  sYY   ( yi  y ) 2 The total sum of squares is i 1 f  Thus, the f statistic is MSR . MSE Note: For ease of computation, the following equations can be used: MSR  SSR  b1s XY  b12 s XX . Note: E MSE    , E MSR   E SSR     1 s XX . 2 2 2 Note: Let t be the statistic for testing H 0 : 1  0 v.s. H a : 1  0 . Then, f  t2 . Motivating Example (continue): Assume   0.05 . To test H 0 : 1  0 v.s. H a : 1  0 , we have the following: b1  5, s XY  2840, SSR  b1s XY  5  2840  14200, 10   yi  y  2 i 1  SSE  10 y i 1 10  y i 1 i 2 i  10 y 2  184730  10  130 2  15730  y   SSR  15730  14200  1530 2 Thus, we have the following ANOVA table Source df Regression 1 Residual (Error) SS MS SSR=14200 MSR  n-2=8 SSE=1530 Total 9 (corrected) 15730 f SSR  14200 1 SSE 8  191.25 MSE  MSR 14200  MSE 191.25  74.1 f  4 Since f  74.1  5.32  f1,8, 0.05 , we reject H 0 : 1  0 . Note that f  74.1  8.61  t 2 . 2 Example 2 (continue): Suppose the model is  yi   0   1 xi   i , i  1,,20,  i ~ N 0,  2 , and 20 x i 1 i 20 20 i 1 i 1  1330,  yi  1862.8,  xi2  90662, 20 y i 1 20 2 i  173554.26,  xi yi  124206.9 i 1 (a)Provide an ANOVA table. (b) Find the 95% confidence interval for  1 .and use the confidence interval to test H 0 : 1  0 . [solution:] (a) Since 2  1862.8  sYY  SST   y  20  y  173554.26  20     53.06  20  i 1 SSR  b1 s XY  0.149  330.7  49.220 20 2 i 2 SSE  SST  SSR  53.06  49.220  3.848 The ANOVA table is Source df SS Residual (Error) n-2=18 SSE=3.848 Regression 1 SSR=49.220 Total (corrected) 19 53.068 MS SSE 18  0.214 MSE  MSR  SSR  49.220 1 5 (b) The 95% confidence interval for  1 is  s2 b1  t n  2,  2 s  XX    1 2  0.214   0.149  t18,0.025     2217  1 2  0.128,0.170 . Since 0  0.128,0.170 , we reject H 0 : 1  0 . Example 3: Given are 5 observations for two variables x and y. xi 2 3 5 yi 25 25 20 Suppose the model is 1 8 30 16  yi   0  1 xi   i , i  1,,5,  i ~ N 0, 2 (a) (b) (c) (d) , Find the least square estimate and the fitted regression equation Provide an ANOVA table and use F statistic to test H 0 : 1  0 at   0.01. Use t statistic to test H 0 : 1  1.5 at   0.01. Find the 95% confidence interval for  0 .and use the confidence interval to test H 0 :  0  30 . [solutions:] (a) Since 5 x i 1 5 i 5 5 5 i 1 i 1 i 1  19,  x  103,  xi y i  383,  yi  116,  y i2  2806, i 1 2 i thus, 2  19    x  5  x  103  5     30.8  5 i 1 5  19   116    xi yi  5  x y  383  5        57.8  5  5  i 1 5 s XX s XY 2 i 2 Then, the least square estimate is b1  s XY  57.8  116   19    1.8766, b0  y  b1 x     (1.8766)     30.3311 s XX 30.8  5  5 The fitted regression equation is yˆ  30.3311  1.8766 x . (b) Since 6 2  116  sYY  SST   y  5  y  2806  5     114.8  5  i 1 SSR  b1 s XY  1.8766  57.8  108.467 5 2 i 2 SSE  SST  SSR  114.8  108.467  6.333 The ANOVA table is Source df SS Regression 1 SSR=108.467 Residual (Error) n-2=3 SSE=6.333 Total n-1=4 (corrected) SST=114.8 MS F SSR 1  108.467 f  MSR  s 2  MSE  MSR MSE  51.381 SSE  2.111 3 Since f  51.381  34.12  f1,3, 0.01 , we reject H 0 : 1  0 . (c) t b1  c  1.8766  (1.5)  0.3766    1.438 . 1 1 s b1  2 2 2 2 . 111 s   s  30.8 XX     Since t  1.438  5.841  t 3,0.005  t n 2, 2 , we do not reject H 0 : 1  1.5 . (d) The 95% confidence interval for  0 is  2 n 2  s  xi  b0  t n 2,  i 1   2  ns XX     1 2  2.111  103   30.3311  t 3,0.025     5  30.8  1 2  30.3311  (3.182  1.188)  26.551,34.111 . Since 30 26.551,34.111 , we do not reject H 0 :  0  30 .

2.7 The analysis of variance to regression analysis

Related documents

Products

Support

2.7 The analysis of variance to regression analysis

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib