Chapter 3 The Two-variable Regression model A. the model Yi X i i , , , i= 1, 2,…, N population model ( the truth model) ε: random error term; or disturbance; unbiased, theoretical error to measure the variance between the actual Y and the observed Y. The problem of specification (1) α and β are unknown parameters of population, need estimate; (2) Y=α+βX , may not be the real (or exact) relationship between X and Y, or may be omitted the some variables, just to choose the important variable specify the population. (3) the value of X and Y, just observed value. There are unobservable values of disturbance and population. (4) random sample- with some sampling error. e Y Yˆ , residual , error term; Yˆi ˆ ˆX i Regression analysis to get the expect value of Y given X, E(Y X ) ; for each E(Y X ) =α+βX , ie. to estimate the model Yˆ ˆ ˆX i i (regression line) 1. assumptions and standard linear regression model (SLRM), or the classical linear regression model. (1). Y and X are linear relate, Yi X i i , , is the “true” model; (2). The X’s are non-stochastic variables where values are fixed, Var(X)≠0 For multiple regression model; the X is a matrix with full rank ( there is no linear relationship between independent variables ) (3). E (ei ) 0 Var (ei ) E(ei ) 2 Cov (ei , e j ) E{[ ei E (ei )][ e j E (e j )]} E (ei , e j ) =0 2 (4). e ~ N (0, 2 ) , (1)~(4) called assumptions of the classical normal linear regression model. Illustrate: (1) Y is related to X. (2) The value of X is fixed. 1 (3) if E (ei ) ' 0 then must to rewrite the model Yi X i ei ( ' ' ) ( ' ) X i (ei ' ) = * X i e * i E (ei* ) E (ei ' ) ' ' =0 var (ei ) E(ei ) 2 2 Homoescedasticity, ie, the variance along the regression line is the same. If var (ei ) i2 , heteroscdasticity; ie, the variance of the regression line is different. 2 Cov (ei , e j ) 0, for i j the error process is serially uncorrelated. [if Cov (ei , e j ) 0, for i j ] the error process is serially correlated or autocorrelated. Negative serial correlation positive serial correlation 2. E ( X i ei ) X i E (ei ) =0 …………………an implicit assumption. * The stochastic regression model has 3 unknow parameters: ,, 2 * E (Yi ) E ( X i ei ) X i ( ei is random var iables ) Var (Yi ) E (Yi E (Yi )) 2 E[( X i ei ) ( X i ) E (ei ) 2 2 ……………regression variance, or variance about the regression line or residual variance. Yi are uncorrelated ( cov( ei e j ) 0, for i j ) Yi ~ ( X i , 2 ) if e ~ N (0, 2 ) , then Yi ~ N ( X i , 2 ) 3 B. Best linear unbiased estimate, BLUE ~ Condtition (1) linear: is a linear combination of sample values. ~ (2) unbiasedness: E ( ) (3) best lowest variance ( ie. Most efficiency, ie: consistency) ~ v a r( ) v a r(* ) , * is any other linear unbiased estimate. 1. prove that X is BLUE: X 1 1 1 1 (X1 X 2 X N ) X1 X 2 X N N N N N N 1 = a 1 X 1 a 2 X 2 a N X N ; ai , i 1,2,..., N N = ai X i (ie. X and X i are linear related) (1) X i (2) E ( X ) E ( ai X i ) ai E ( X i ) X , a i N 1 N (3) Var ( X ) Var( ai X i ) E[ ai X i E( ai X i )] 2 E[ ai ( X i x )]2 = E[ ai2 ( X i x ) 2 ] E[ ( X i x )( X j x )] i j = E[ a ( X i x ) ] ai a j [ E ( X i x )( X j x )] 2 i 2 i j = ai2 E[( X i x ) 2 ] X2 ai2 X2 ( 1 2 X2 ) (is as small as possible) N N * if H=F (a1 , a2 ,, a N ) G(a1 , a2 ,, a N ) X2 ai2 ( ai 1) * to Min. X2 ai2 H 2 X2 ai 0 ai ai , 2 X2 f o ra l li, ai H ( ai 1) 0, 1 Account to (A1), (A2), a i , for all i N prove the X is BLUE 4 N 2 X2 …….(A1) 2 X2 ……………(A2) N 2. Gauss-Markov theorem: ( LSE is BLUE) * * Given assumption (1)(2)(3), least squares estimators ˆ and ˆ are the best linear unbiased estimators of and . LSE: ˆ * (X (X i i X ) Yi X ) 2 C i Yi , where C i i Xi ˆ N N ˆ ( ˆ and are functions of random var., ˆ Y ˆX (X (X Y i X) X )2 i ˆ and ˆ are random var. ) Prove: (1) account to the formula of ̂ , the ci is constant. ̂ and Yi are linearly related, ie. ̂ is a linear estimator. (Xi X ) 0 , (Ps: Ci ( X i X )2 (X i X )2 ( X i X )2 C {[( X X ) 2 ]2 } [ ( X X ) 2 ]2 i i 1 (X i X )2 2 i (X X ) 1 (X X ) X X X 1) } X NX (X X ) Ci x i {( X i X ) 2 ( X i X )} i (X X )X Ci X i { ( Xi X ) 2 i i 1 xi2 2 i 2 i 2 i i 2 2 i (2) ˆ Ci Yi Ci ( X i ei ) C i Ci X i Ci ei Ci ei E (ˆ ) E ( C e ) E (C e C e .... C e ) i i 1 1 2 2 N N = C1 E (e1 ) C2 E (e2 ) .... C N E (e N ) ( ˆ is an unbiased estimator of ) (3) Var (ˆ ) E[ˆ E ( )] 2 E (ˆ ) 2 E ( Ci ei ) 2 = E(C1 e1 C2 e2 .... C N eN 2C1C2 e1e2 2C1C3e1e3 ... 2C N 1Cn eN 1eN ) 2 2 2 2 2 2 = C1 2 C2 2 .... C N 2 2 Ci 2 2 2 Var ( ˆ ) by N , 2 cons tan t and X small var iantion in X , the l arg er Var ( ˆ ) 5 2 2 (X i X )2 ~ Define any arbitrary linear estimator of as wi Yi , where wi ci d i , (d i : any arbitrary constant) ~ For to be an unbiased estimator of , the d i must fulfill certain conditions. ~ wi ( X i ei ) wi wi X i wi ei ~ E ( ) wi wi X i , for unbiasedness, wi 0, and wi X i 1 wi ci d i , d i 0, d X d x i i i i 0 ~ 2 2 2 ( ) E ( wi ei ) 2 2 wi 2 ci 2 d i = 2 x 2 i 2 d i2 Var ( ˆ ) 2 d i 2 ~ ~ Ie Var ( ) Var ( ˆ ) , only when d i =0, Var ( ) Var ( ˆ ) ̂ has Min. Variance. <Prove ̂ > X i Yi X c Y [( 1 X c )Y ] ˆ ii N i i N N N ̂ and Yi are linearly related, ie. ̂ is a linear estimator (1) ˆ Y i 1 1 X ci )Yi ] [( X ci )( X i ei )] N N e = ( X ci X i X ci X i i X cei ) N N N 1 1 = X X ( X c i ) ei = ( X c i ) e i N N (̂ is an unbiased estimator of ) E(ˆ ) , (2) ˆ [( ( ei ) 1 (3) Var (ˆ ) E[ˆ E ( )] E[ ( X ci )ei ]2 E[ ] E[ X 2 ( ci ei ) 2 ] N N2 2 2 Xi 1 X2 X ( ) = 2 2 N N xi N xi2 xi 2 2 2 2 2 6 2 1 Define any arbitrary linear estimator of as ~ ( Xwi )Yi , where N wi ci d i , (d i : any arbitrary constant) for unbiasedness, w i 0, and w X i i 1, d i 0, d X d x i i i i 0 ei X w e ] 2 Var (~ ) E[~ E (~ )] 2 E[ ii N e = E[ i N = 2 N X wi ei ] E[ 2 ( ei ) 2 N 2 ] E[ X ( wi ei ) ] 2 2 2 N X 2 2 wi2 X 2 2 ( ci2 d i2 ) var(ˆ ) X 2 2 d i2 Ie Var(~) Var(ˆ ) , only when d i =0, Var(~) Var(ˆ ) ̂ has Min. Variance. 1 ~ * Cov(ˆ , ˆ ) E{[ˆ E (ˆ )][ E ( ˆ )]} E[ ( X ci )ei ci ei ] N = E[ e c e i i i N X ( c i ei ) 2 ] 2 ci X2 2 2 N x ˆ Variance coVariance matrix ( ) ˆ X 2 x 7 N X X x2 1 x 2 2 x 2 i 2 X xi2 C. S 2 ˆ 2 1 eˆi2 is an unbiased estimator of 2 N 2 2 E(ei ) 2 , eˆ 2 =ESS; and to be estimated , the d.f. =N-2 ( or T-K) eˆi Yi Yˆi Yi ˆ ˆX i <PROVE> in deviation form: yˆ i ̂xi eˆi yi yˆ i yi ̂ xi = xi (ei e ) ˆ xi = ( ˆ ) x (e e ) i i eˆi [( ˆ ) xi (ei e )] 2 2 eˆi2 [( ˆ ) 2 xi (ei e ) 2 2( ˆ ) xi (ei e )] = ( ˆ ) 2 x 2 (e e ) 2 2( ˆ ) x (e e )] 2 i i A B For C: i C For A: E [( ˆ ) 2 xi2 ] xi2 E ( ˆ ) xi2 For B: i 2 x 2 i 2 1 ( ei2 )] N 1 =E [e12 e22 e N2 ] E[(e1 e2 e N )(e1 e2 e N )] N 1 = N 2 ( N 2 ) ( N 1) 2 N x i ei ( x i ei ) 2 ˆ ( ) x i ( ei e ) ( x i ei e x i ) ( ˆ ) 2 xi2 2 2 xi xi E [ (ei e ) 2 ] E[ ei2 2 E[ ( ˆ ) xi (ei e )] E ( ˆ ) 2 xi2 xi2 x 2 i 2 E ( ei2 ) E( A) E( B) E(C) 2 ( N 1) 2 2 2 ( N 2) 2 E( S 2 ) 1 1 E ( eˆi2 ) ( N 2) 2 2 , S 2 is unbiased N 2 N 2 8 S: standard error of regression, SER; or standard error of the estimate, SEE; or estimation standard deviation of e. Sˆ , S ˆ : standard error of the estimation of coefficient 1 1 1 E ( eˆi2 ) ( y i ˆxi ) 2 ( y i2 ˆ 2 xi2 2ˆ xi y i ) N 2 N 2 N 2 1 ( y i2 ˆ xi y i ) = N 2 S2 D. if Yi ~ N ( X i , ) , then ~ N ( , 2 2 X i2 x 2 i ) , ˆ ~ N ( , 2 x 2 i ) ˆ and ˆ are linear combination of independently normal variances, Y1 , Y2 , , YN ˆ and ˆ must be normally distributed. H 0 : 0, H1 0, 0.05 ˆ 0 ˆ ~ N (0, 1), if ˆ is known ˆ The test statistic: Z C x ˆ 0 tC S ˆ Interval estimates: ˆ t N 2, ~t N 2, S ˆ 2 if ˆ is unknown and ˆ t 2 N 2, S ˆ 2 E. Descriptive properties (some mathematic characteristics of LSE) Yˆ ˆ ˆ X …… regression, Y Yˆ eˆ ˆ ˆ X eˆ 1. Prove: 2. Prove: eˆ 0, eˆ ( y i i i eˆ 0 ˆx ) eˆ X 0, eˆ X eˆ i i i eˆ x ( y i i y eˆ Y Yˆ ……calculated residual i Xeˆ 0 ˆ xi Ny ˆ x 0 this is a good property. ( X X X ) eˆxi eˆX eˆx i , i (x y 0) xi y i x 2 0 ˆ xi ) xi xi yi ˆ xi2 xi yi i xi2 9 Account to 1. & 2. ie. X and eˆ are orthogonal , ie. X ' eˆ 0 2T T 1 X and eˆ are linearly uncorrelated Must be to estimate (find)value of ˆ and ˆ , when X and eˆ uncorrelated. 3. Y Yˆ PROVE: Y (Yˆ eˆ ) Yˆ eˆ Y i N eˆ 4. i Yˆi 0, i i N Yˆeˆ 0 i i N Yˆ i N Yˆ ie. If Yˆ eˆ 0 , can increase the performance of regression. PROVE: F. eˆi Yˆi eˆi (ˆ ˆX i ) ˆ eˆi ˆ eˆi X i 0 0 0 Goodness of Fit ( Yi Y ) ( Yi Yˆi ) ( Yˆi Y ) explained deviation of Y Total deviation of Y unexplained deviation of Y 2 ( Yi Y ) ( Yi Yˆi ) 2 ( Yˆi Y ) 2 TSS ESS RSS (Yi Y ) 2 (Yi Yˆi ) 2 2(Yi Yˆi )(Yˆi Y ) (Yˆi Y ) 2 (Y i Y ) 2 (Yi Yˆi ) 2 2 (Yi Yˆi )(Yˆi Y ) (Yˆi Y ) 2 (Yi Yˆ ) 2 (Yˆi Y ) 2 10 2. R2 the coefficient of determination ie. The R squared of the regression equation Define: R 2 RSS ESS 1 TSS TSS RSS R TSS 2 (Yˆ Y ) (Y Y ) 2 i i yˆ y 2 i 2 ( ˆ x ) y i 2 i 2 i 2 ˆ 2 x y 2 i 2 i var( X ) ˆ 2 var(Y ) RSS ESS eˆi R 1 1 TSS TSS yi2 2 2 (1) R2 is a measure of the goodness of fit of the regression model ie. Ex. R2=0.959, measure the proportion (95.9%) of the variation in Y which is explained by the regression equation; ie how regression model fits the data (S or ̂ ) ( SEE, standard error of estimate), is one kind measure but the problem is that it depends on the unit of measurement used. If unit is changed, we will get different estimate. R2 is independent of unit of measurement.) ** when use R2 , must be keep follow condition.: (a) The estimator must be an OLS estimator. (b) The relationship be estimated must be linear. (c) The linear relationship that is being estimated must include a constant or intercept term. (2) 0 R 2 1 R2=0, the model can’t explain the variation in Y R2 =1, the case of perfect fit in Y. (special case) 11 (3) R2 is higher value when used the time series data; R2 is lower when used the cross-section data. (4) R2 can be used for analyze the causality (relationship between of X and Y) (5) If multiple regression, R 2 as independ var iable independ var iable , TSS fixed , but the RSS ↑, ie ESS ↓. So We use R 2 ( the adjusted coefficient of determination) It takes into account of d.f. And may be constant as k↑ 1 R 2 ESS / N K TSS / N 1 ESS/N-K ; the residual variance, TSS/N-1: the variance in Y Var (eˆ) S2 = Var (Y ) Var (Y ) R 2 1 ( ESS / N K ESS N 1 N 1 ) 1 1 (1 R 2 ) , TSS / N 1 TSS N K N K K k 1 As K↑→(N-K) ↓→ R 2 ↓ As K↑→ TSS fixed, RSS↑, so R 2 is not necessary↓ as K↑, may be offset * if K↑: S2 ↓,-, ↑ so the R 2 ↑,-, ↓ * R 2 can be negative , the condition is 1 (1 R 2 ) 12 N 1 0 N K ie. 1 (1 R 2 ) N 1 NK 3. testing the regression equation: critical value Fd . f .1, d . f .2 , FC exp lained var iance RSS / 1 , if Y X unplained var iance ESS / N 2 y / 1 ˆ x eˆ / N 2 ˆ = 2 i 2 2 i 2 i 2 RSS N 2 RSS / TSS N 2 ESS 1 ESS / TSS 1 R 2 /1 (1 R 2 ) / N 2 FC =0: No the explained power of the independent variance in regression for dependent variance. The value of FC is getting large: (1) t N2 2, F1, N 2, tC If ˆ S ˆ β=0, with null hypothesis: H0: β=0 ˆ S x t the relationship between X and Y is very closely. 2 N 2 2 ˆ 2 x 2 eˆ 2 N 2 RSS / 1 = F1, ESS N 2 N 2 (2) The F testing can be used the joint hypothesis test: (multi-variables regression equation ) H0: all β=0 (not include β0 ) H1: not all β=0 ˆ 2 x 2 ˆ 0 * as 0, ˆ Y X , but don' t happen ˆ Y , F ˆ 2 13 G. maximum likelihood estimation Yi X i ei , ei ~ N (0, 2 ) Yi ~ N ( X i , 2 ) 1 f (Yi ) 2 2 1 exp[ 2 2 (Yi X i ) 2 ] frequency function, f (Yi , , , 2 ) L= f(Y1) f(Y2) … f(YN) …………………likelihood function N 1 i 1 2 = 2 Log L = 1 exp[ 2 2 (2 2 ) N 1 log( 2 2 ) 2 2 2 log L 2 2 2 (Y i (Y ~ ie , ~ MLE Y X ˆ OLSE (Y i i N 2 exp[ 1 2 2 (Y i X i ) 2 ] X i ) 2 Y X X i ) 0 N Yi X i log L 2 2 2 1 (Yi X i ) 2 ] = i i 0 ~ Xi N N ~ If ˆ OLSE , then ~ ˆ OLSE ) ~ ( ie. Y i ~ Yi X i ~ X i X i2 0 X i ) X i 0 ~ Xi ~ ) X i X i2 0 N N ~ Yi X i ( X i ) 2 ~ Yi X i X i2 0 N N Yi X i ( Y i N X i Yi X i Y ~ MLE ˆ OLSE 2 2 N X i ( X i ) log L N 1 2 2 2 2 4 ~ 2 MLE = 1 N ie. ~ 2 (Y i MLE (Yi X i ) 2 0 1 ~ ~ X i ) 2 = N (e ) 2 i 2 is biased estimate of ~ 2 14 OLSE N 2 1 (Yi X i ) 2 0 2 2 1 2 ei N 2