59 4. Linear Regression 4.1. Estimation and Hypothesis Testing (I) Least square estimate (point estimate): b XtX 1 Xty, ˆ Xb, e y y ˆ. y Properties of least square estimate: E (b) Var (b0 ) cov( b , b ) 1 0 V (b) cov( b p 1 , b0 ) cov( b0 , b1 ) Var (b1 ) cov( b p 1 , b1 ) cov( b0 , b p 1 ) cov( b1 , b p 1 ) Var (b p 1 ) 2 ( X t X ) 1 (II) F-test: (a) H 0 : 0 1 p 1 0 ANOVA Table: Source df Regression p Residual (Error) Total (corrected) n-p n SS MS t t t b X y y t y bt X t y n y i 1 2 i F t b X y p s 2 MSE bt X t y p f s y t y bt X t y n p yt y f f p , n p , reject H 0 (b) H 0 : 1 p 1 0 ANOVA Table: Source df SS MS 59 F 2 60 b t X t y ny 2 Regression p-1 Residual n-p (Error) Total n-1 (corrected) b t X t y ny 2 p 1 y t y bt X t y n y i 1 s 2 MSE y t y bt X t y n p y y t y ny 2 2 i f f p 1, n p , reject H 0 (c) H0 : k 0 SS b ,, b SS b ,, b p 1 0 f RSS model p , bk 1 , , b p 1 1 n p SS b0 , , b p 1 SS b0 , , bk 1 , bk 1 , , b p 1 s2 and reject H 0 : k 0 as Note: k 1 0 f f 1, n p , . SS b0 , b1 ,, b p 1 b t X t y , where 1 1 X 1 b (X t x11 x21 xn1 X ) 1 X t x1 p 1 x2 p 1 , xnp 1 b0 b 1 y . b p 1 On the other hand, SS b0 , b1 ,, bk 1 , bk 1 ,, bp1 bt X t y where 60 b X y ny t f t 2 s 2 p 1 61 1 x11 1 x 21 X 1 xn1 b (X (d) x1k 1 x1 p 1 x2 k 1 x2 p 1 , xnk 1 xnp1 x1k 1 x2 k 1 xnk 1 t X ) 1 X t b0 b 1 y bk 1 . bk 1 b p 1 H 0 : j1 j2 j pq 0 SS b ,, b SS b , b p 1 0 f 0 k1 ,, bkq 1 RSS model p n p SS b0 ,, b p 1 SS b0 , bk1 ,, bkq 1 s pq pq 2 and reject H 0 : j1 j2 j p q 0 as f f p q , n p , . t t Note: SS b0 , b1 ,, b p 1 b X y , where 1 1 X 1 x11 x21 xn1 x1 p 1 x2 p 1 , xnp 1 b0 b 1 t 1 t b (X X ) X y . b p 1 On the other hand, 61 62 SS b0 , bk1 ,, bkq 1 b t X t y, k1 ,, k q 1 j1 ,, j p q ,1 k1 ,, k q 1 p 1 where 1 1 X 1 x1k1 x2 k1 xnk1 x1kq 1 x2 kq 1 , xnkq 1 b0 b k1 t 1 t b (X X ) X y . b k q 1 (e) H 0 : c10 0 c11 1 c1 p 1 p 1 0 c 20 0 c 21 1 c 2 p 1 p 1 0 ....................... c p q 0 0 c p q1 1 c p qp1 p 1 0 SS b ,, b SS b ,, b p 1 0 f 0 q 1 RSS model p n p SS b0 ,, bp1 SS b0 ,, bq1 and reject H 0 as s pq pq 2 f f p q , n p , . t t Note: SS b0 , b1 ,, b p 1 b X y , where 1 1 X 1 x11 x21 xn1 62 x1 p 1 x2 p 1 , xnp 1 , 63 b (X t X ) 1 X t b0 b 1 y . bp 1 On the other hand, the reduced model under H 0 is yi 0 zi 0 1 zi1 q1 ziq1 i . t t Then, SS b0 , b1 ,, bq1 b Z y , where z10 z 20 Z z n 0 z11 z 21 z n1 z1q 1 z 2 q 1 , z nq 1 b0 b 1 t 1 t b (Z Z ) Z y . bq 1 Interval estimate: bi t n p , s.e.bi bi t n p , s.e.bi , bi t n p , s.e.bi 2 2 2 t where s.e.(bi ) the (i 1)' th diagonal element of X X 1 s2 t-test H 0 : i c : t bi c t t reject H 0 , n p , 2 s.e.bi (III) Prediction of E( y h ) xh 0 1 xh1 p 1 xhp1 t 63 64 Point estimate: ˆ h xht b b0 b1 xh1 bp 1 xhp1 y Interval estimate: yˆ h t n p , / 2 s.e.( yˆ h ) yˆ h t n p , / 2 s.e.( yˆ h ), yˆ h t n p , / 2 s.e.( yˆ h ) , where s.e.( yˆ h ) s xht X t X 1 xh . (IV) R 2 , rYYˆ and adjusted R 2 n R 2 ( yˆ i 1 n y )2 i ( yi y ) 2 i 1 rYYˆ b t X t y ny 2 n y i 1 i y 2 nn 1p 2 2 R 2 , Adjusted R 1 1 R Example 1: Here is a set of data y 7.2 8.1 9.8 12.3 12.9 x1 -1 -1 0 1 1 x2 -1 0 0 0 1 (a) Find the least square estimate and the fitted regression equation (b) Provide an ANOVA table and use F statistic to test H 0 : 0 1 2 0 at 0.05 . (c) Find the ANOVA table for the hypothesis H 0 : 1 2 0 and use F statistic to test H 0 : 0 1 0 at 0.01. (d) Find s.e.b0 and the estimate of covb1 , b2 . (e) Find the 95% confidence interval for 1 .and use the confidence interval to test H 0 : 1 0 . (f) Test H 0 : 2 0 based on t-statistic at 0.05 . (g) Determine R 2 , rYYˆ and adjusted R 2 . (h) Find the 95% confidence interval for E y h at xh 1 0.5 0 . t [solution:] 64 65 (a) 7.2 1 8.1 1 y 9.8 , X 1 12 . 3 1 12.9 1 1 1 0 1 1 1 0 0 . 0 1 Thus, 1 1 1 1 1 1 1 1 1 0 5 0 0 1 X t X 1 1 0 1 1 1 0 0 0 4 2 1 0 0 0 1 1 1 0 0 2 2 1 1 1 and 7.2 1 1 1 1 1 8.1 50.3 X t y 1 1 0 1 1 9.8 9.9 . 1 0 0 0 1 12.3 5.7 12.9 Therefore, b X X t 1 0 0 50.3 10.06 0.2 X y 0 0.5 0.5 9.9 2.1 0 0.5 1 5.7 0.75 t and the fitted regression equation is yˆ 10.06 2.1x1 0.75x2 . (b) 50.3 SS b0 , b1 , b2 b t X t y 10.06 2.1 0.75 9.9 531.083 5.7 5 and y i 1 2 i y t y 531.19 . Thus, RSS model 3 y t y b t X t y 531.19 531.083 0.107 . 65 66 Therefore, the ANOVA table for H 0 : 0 1 2 0 is Source df Regression 3 Residual (Error) 2 Total 5 (corrected) SS MS b X y 531.083 t t t b X y 531.083 p 3 177.0277 y t y b t X t y 0.107 n y i 1 2 i F t bt X t y p f 2 s 177.0277 0.0535 3308.929 0.107 2 0.0535 s2 y t y 531.19 Since f 3308.929 19.1643 f 3, 2, 0.05 reject H 0 (c) Since y 10.06 ny 2 5 10.06 2 506.018 SS b1 , b2 | b0 b t X t y ny 2 531.083 506.018 25.065 . The ANOVA table for H 0 : 1 2 0 Source df Regression 2 Residual (Error) 2 Total 4 (corrected) SS MS b t X t y ny 2 25.065 b t X t y ny 2 p 1 25.065 2 12.5325 y t y bt X t y 0.107 n y i 1 i y F b X t f t y ny 2 s p 1 2 12.5325 0.0535 234.2523 0.107 2 0.0535 s2 2 25.172 Since f 234.2523 99.0 f 2, 2, 0.01 reject H 0 (d) Since 66 67 Vˆ b s X X 2 t 1 0 0 0.0107 0 0 0.2 0.0535 0 0.5 0.5 0 0.02675 0.02675 , 0 0.5 1 0 0.02675 0.0535 therefore s.e.b0 0.0107 0.1034 covb1 , b2 is -0.02675. and the estimate of s.e.b1 0.02675 0.1636 , the 95% confidence interval for (e) Since 1 is b1 t 2,0.05 s.e.b1 2.1 4.303 0.1636 1.4,2.8 . 2 0 1.4,2.8 reject H0 . s.e.b2 0.0535 0.2313 , the t statistic is (f) Since t b2 0.75 3.2425 . s.e.b2 0.2313 Thus, t 3.2425 4.303 not reject H 0 . (g) n R2 ( yˆ i i 1 n ( y i i 1 y )2 y )2 SS b1 , b2 | b0 n y i 1 i y 2 25.065 0.9957 25.172 rYYˆ R 2 0.9957 0.9878 n 1 4 1 1 0.9957 0.9914 . Adjusted R 2 1 1 R 2 2 n p (h) yˆ h b0 b1 xh1 bp 1 xh 2 10.06 2.1 0.5 11.11 t h t Since x X X 1 0 0 1 0.2 xh 1 0.5 0 0 0.5 0.5 0.5 0.325 , 0 0.5 1 0 67 68 s.e.( yˆ h ) s xht X t X 1 xh 0.325 0.0535 0.1319 The 95% confidence interval for E y h at xh 1 0.5 0 t yˆ h t 2, 0.05 / 2 s.e.( yˆ h ) 11.11 4.303 0.1319 10.54,11.68 Example 2: Here is a set of data with the model yi 0 1 xi1 2 xi 2 i , i 1,,5 . y 15 15 25 10 30 xi1 -2 -1 0 1 2 xi 2 1 -1 0 -1 1 (i) Find the least squares estimate and the fitted regression equation. (j) Provide an ANOVA table and use F statistic to test H 0 : 0 1 2 0 at 0.05 . (k) Find the ANOVA table for the hypothesis H 0 : 1 2 0 and use F statistic to test H 0 : 1 2 0 at 0.05 . (l) Find F statistic to test the hypothesis H 0 : 0 1 0 at 0.05 . (m) Find the ANOVA table for the hypothesis H 0 : 2 0 and use F statistic to test the hypothesis at 0.05 . (n) Find F statistic to test H 0 : 2 21 at 0.05 . [solution:] (a) 15 1 15 1 y 25 , X 1 10 1 30 1 2 1 0 1 2 1 1 .b X t X 0 1 1 1 b0 19 X y b1 2.5 b2 5 t and the fitted regression equation is yˆ 19 2.5x1 5x2 . Note: 68 69 5 0 0 X t X 0 10 0 0 0 4 and 95 X t y 25 20 (b) SS b0 , b1 , b2 b t X t y 1967.5 and 5 y i 1 2 i y t y 2075 . Thus, RSS model 3 y t y bt X t y 2075 1967.5 107.5 . Therefore, the ANOVA table for H 0 : 0 1 2 0 is Source df Regression 3 Residual (Error) 2 Total 5 (corrected) Since SS MS b X y 1967.5 t t t b X y 1967.5 p 3 655.83 y t y b t X t y 107.5 n y i 1 2 i F t bt X t y f s 655.83 53.75 12.2 107.5 2 53.75 s2 y t y 2075 f 12.2 19.2 f 3, 2, 0.05 not reject H 0 (c) Since y 19 ny 2 5 19 2 1805 SS b1 , b2 | b0 b t X t y ny 2 1967.5 1805 162.5 . The ANOVA table for H 0 : 1 2 0 Source df SS MS 69 F p 2 70 b t X t y ny 2 162..5 Regression 2 Residual (Error) 2 Total 4 (corrected) b t X t y ny 2 p 1 162.5 2 81.25 y t y b t X t y 107.5 5 y i 1 b X t t f y ny 2 s p 1 2 81.25 53.75 1.512 107.5 2 53.75 s2 y 2 i 270 Since f 1.512 19 f 2, 2, 0.05 not reject H 0 (d) Test H 0 : 0 1 0 at 0.05 Step1: Find the regression sum of square and residual sum of square for the full model yi 0 1 xi1 2 xi 2 i . That is, SS b0 , b1 , b2 b t X t y 1967.5, RSS model 3 y t y b t X t y 107.5 Step2: Find the regression sum of square for the reduced model yi 2 xi 2 i . That is, SS b2 bt X t y 100 , where 1 1 X 0 1 1 and the least squares estimate b is b b2 X t X Step3: Find the F statistic 70 1 X t y 5 . 71 SS b0 , b1 | b2 f 3 1 RSS model 3 53 SS b0 , b1 , b2 SS b2 RSS model 3 2 1967.5 100 and not reject 107.5 H 0 : 0 1 0 2 2 17.372 2 since f 17.372 19 f 2 , 2 , 0.05 . (e) Test H 0 : 2 0 at 0.05 Step1: Find the regression sum of square and residual sum of square for the full model yi 0 1 xi1 2 xi 2 i . That is, SS b0 , b1 , b2 b t X t y 1967.5, RSS model 3 y t y b t X t y 107.5 Step2: Find the regression sum of square for the reduced model yi 0 1 xi1 i . That is, SS b0 , b1 b t X t y 1867.5 , where 1 2 1 1 X 1 0 1 1 1 2 and the least squares estimate b is b b 0 X tX b1 71 1 19 Xty . 2.5 72 Step3: The ANOVA table for H 0 : 2 0 Source df Regression 1 SS MS F 100 SS (b2 | b0 , b1 ) SS b0 , b1 , b2 100 53.75 1.86 f SS (b0 , b1 ) Residual (Error) 2 100 y y b t X t y 107.5 t Total 3 (corrected) 107.5 2 53.75 s2 207.5 Find the F statistic SS b0 , b1 , b2 SS b0 , b1 f RSS model 3 1 53 1967.5 1867.5 1.86 107.5 2 and not reject H 0 : 2 0 since f 1.86 18.5 f1, 2 , 0.05 (f) Test H 0 : 2 21 at 0.05 Step1: Find the regression sum of square and residual sum of square for the full model yi 0 1 xi1 2 xi 2 i . That is, SS b0 , b1 , b2 b t X t y 1967.5, RSS model 3 y t y b t X t y 107.5 Step2: Find the reduced model. As H 0 : 2 21 is true, the reduced model is yi 0 1 xi1 2 xi 2 i 0 1 xi1 21 xi 2 i 0 1 ( xi1 2 xi 2 ) i 0 zi 0 1 zi1 i model 2 where 0 0 , 1 1 , zi 0 1, zi1 xi1 2 xi 2 . Therefore, 72 73 z10 z 20 Z z30 z 40 z50 1 1 1 1 1 z11 x11 1 z 21 1 x21 z31 1 x31 z 41 1 x41 z51 1 x51 2 2 1 1 1 1 2 1 0 2 0 1 1 2 1 1 2 2 1 1 2 x12 2 x22 2 x32 2 x42 2 x52 0 3 0 1 4 Then, SS (b0 , b1 ) b Z y 1967.5, t t where b 19 b 0 ( Z t Z ) 1 Z t y . 2.5 b1 Step3: Find the F statistic f SS b2 | b0 , b1 3 2 RSS model 3 53 SS b0 , b1 , b2 SS b0 , b1 1 RSS model 3 2 1967.5 1967.5 1 0 107.5 2 and not reject H 0 : 2 21 since f 0 18.5 f1, 2 , 0.05 . 73