Chapter 3 The Two-variable Regression model   

advertisement
Chapter 3 The Two-variable Regression model
A. the model
Yi     X i   i , , , i= 1, 2,…, N
population model ( the truth model)
ε: random error term; or disturbance; unbiased, theoretical error to measure the variance
between the actual Y and the observed Y.
The problem of specification (1) α and β are unknown parameters of population, need
estimate; (2) Y=α+βX , may not be the real (or exact) relationship between X and Y, or may be
omitted the some variables, just to choose the important variable specify the population. (3) the
value of X and Y, just observed value. There are unobservable values of disturbance and population.
(4) random sample- with some sampling error.
e  Y  Yˆ , residual , error term; Yˆi  ˆ  ˆX i
Regression analysis  to get the expect value of Y given X, E(Y X ) ;
for each E(Y X ) =α+βX , ie. to estimate the model Yˆ  ˆ  ˆX
i
i
(regression line)
1. assumptions and standard linear regression model (SLRM), or the classical linear regression
model.
(1). Y and X are linear relate, Yi     X i   i , , is the “true” model;
(2). The X’s are non-stochastic variables where values are fixed, Var(X)≠0
For multiple regression model; the X is a matrix with full rank ( there is no linear
relationship between independent variables )
(3). E (ei )  0
Var (ei )  E(ei )   2
Cov (ei , e j )  E{[ ei  E (ei )][ e j  E (e j )]}  E (ei , e j ) =0
2
(4).
e ~ N (0,  2 ) , (1)~(4) called assumptions of the classical normal linear regression
model.
Illustrate:
(1) Y is related to X.
(2) The value of X is fixed.
1
(3) if E (ei )   '  0 then
must to rewrite the model
Yi    X i  ei  ( ' ' )  (   ' )  X i  (ei   ' ) =  *  X i  e * i
 E (ei* )  E (ei   ' )   ' ' =0
var (ei )  E(ei )   2
2
 Homoescedasticity, ie, the variance along the regression line is the same.
If var (ei )   i2 , heteroscdasticity; ie, the variance of the regression line is different.
2
Cov (ei , e j )  0, for i  j  the error process is serially uncorrelated.
[if Cov (ei , e j )  0, for i  j ]  the error process is serially correlated or autocorrelated.
Negative serial correlation
positive serial correlation
2.
 E ( X i ei )  X i E (ei ) =0
…………………an implicit assumption.
* The stochastic regression model has 3 unknow parameters: ,, 2
* E (Yi )  E (  X i  ei )    X i
( ei is random var iables )
Var (Yi )  E (Yi  E (Yi )) 2  E[(  X i  ei )  (  X i )  E (ei ) 2   2
……………regression variance, or variance about the regression line or residual variance.
Yi are uncorrelated (  cov( ei e j )  0,
for i  j )
 Yi ~ (  X i ,  2 )
if e ~ N (0,  2 ) , then  Yi ~ N (  X i ,  2 )
3
B. Best linear unbiased estimate, BLUE
~
Condtition (1) linear:  is a linear combination of sample values.
~
(2) unbiasedness: E ( )  
(3) best  lowest variance ( ie. Most efficiency, ie: consistency)
~
v a r( )  v a r(* ) ,
 * is any other linear unbiased estimate.
1. prove that X is BLUE:
X
1
1
1
1
(X1  X 2    X N )  X1  X 2    X N
N
N
N
N
N
1
= a 1 X 1  a 2 X 2    a N X N ; ai  , i  1,2,..., N
N
=  ai X i
(ie. X and X i are linear related)
(1) X 
i

(2) E ( X )  E ( ai X i )   ai E ( X i )   X ,
a
i

N
1
N
(3) Var ( X )  Var( ai X i )  E[ ai X i E( ai X i )] 2 E[ ai ( X i  x )]2
= E[ ai2 ( X i   x ) 2 ]  E[ ( X i   x )( X j   x )]
i
j
= E[ a ( X i   x ) ]   ai a j [ E ( X i   x )( X j   x )]
2
i
2
i
j
=  ai2 E[( X i   x ) 2 ]   X2  ai2   X2  (
1 2  X2
) 
(is as small as possible)
N
N
* if H=F (a1 , a2 ,, a N )   G(a1 , a2 ,, a N )   X2  ai2   ( ai  1)
* to Min.  X2  ai2
H
 2 X2  ai    0
ai
ai 

,
2 X2
f o ra l li,   ai 
H
 ( ai  1)  0,

1
Account to (A1), (A2), a i  , for all i
N
prove the X is BLUE
4

N
2 X2
…….(A1)
2 X2
……………(A2)
N
2. Gauss-Markov theorem:
( LSE is BLUE)
* * Given assumption (1)(2)(3), least squares estimators ˆ and ˆ are the best linear
unbiased estimators of  and  .
LSE: ˆ * 
 (X
 (X
i
i
 X ) Yi
X )
2
  C i Yi ,
where C i 
i
 Xi
 ˆ
N
N
ˆ
( ˆ and  are functions of random var.,
ˆ  Y  ˆX 
 (X
 (X
Y
i
 X)
 X )2
i
 ˆ and ˆ are random var. )
Prove: (1) account to the formula of ̂ , the ci is constant.  ̂ and Yi are linearly related,
ie. ̂ is a linear estimator.
(Xi  X )  0 ,
(Ps:  Ci 
 ( X i  X )2
(X i  X )2
( X i  X )2

 C  {[( X  X ) 2 ]2 }  [ ( X  X ) 2 ]2 
 i
i
1

 (X i  X )2
2
i
(X  X )  1
(X  X )
 X  X  X  1)
}
 X  NX
(X  X )
 Ci x i  {( X i  X ) 2 ( X i  X )} 
i
(X  X )X
 Ci X i { ( Xi  X ) 2 i
i
1
 xi2
2
i
2
i
2
i
i
2
2
i
(2) ˆ   Ci Yi   Ci (   X i  ei )    C i    Ci X i   Ci ei     Ci ei
E (ˆ )  E (  C e )    E (C e  C e  ....  C e )

i i
1 1
2 2
N
N
=   C1 E (e1 )  C2 E (e2 )  ....  C N E (e N )  
( ˆ is an unbiased estimator of  )
(3) Var (ˆ )  E[ˆ  E (  )] 2  E (ˆ   ) 2  E ( Ci ei ) 2
= E(C1 e1  C2 e2  ....  C N eN  2C1C2 e1e2  2C1C3e1e3  ...  2C N 1Cn eN 1eN )
2
2
2
2
2
2
= C1  2  C2  2  ....  C N  2   2  Ci 
2
2
2
 Var ( ˆ )  by N ,  2 cons tan t and
X
 small var iantion in X , the l arg er Var ( ˆ )
5
2

2
(X
i
 X )2
~
Define any arbitrary linear estimator of  as    wi Yi , where
wi  ci  d i , (d i : any arbitrary constant)
~
For  to be an unbiased estimator of  , the d i must fulfill certain conditions.
~
   wi (   X i  ei )    wi    wi X i   wi ei
~
 E (  )    wi    wi X i , for unbiasedness,  wi  0, and  wi X i  1
 wi  ci  d i ,   d i  0,
d X  d x
i
i
i
i
0
~
2
2
2
 (  )  E ( wi ei ) 2   2  wi   2  ci   2  d i
=
2
x
2
i
  2  d i2  Var ( ˆ )   2  d i
2
~
~
Ie Var (  )  Var ( ˆ ) , only when d i =0, Var (  )  Var ( ˆ )
 ̂ has Min. Variance.
<Prove ̂ >
 X i   Yi  X c Y  [( 1  X c )Y ]
 ˆ
ii  N
i
i
N
N
N
̂ and Yi are linearly related, ie. ̂ is a linear estimator
(1) ˆ 
Y
i
1
1
 X ci )Yi ]  [(  X ci )(   X i  ei )]
N
N
e


=  (  X ci  X i  X ci X i  i  X cei )
N
N
N
1
1
=    X   X   (  X c i ) ei =    (  X c i ) e i
N
N
(̂ is an unbiased estimator of  )
 E(ˆ )   ,
(2) ˆ   [(
( ei )
1
(3) Var (ˆ )  E[ˆ  E ( )]  E[ (  X ci )ei ]2  E[
]  E[ X 2 ( ci ei ) 2 ]
N
N2
2
2
  Xi
1
X2
X


(

)
=
2
2
N
N  xi
N  xi2
 xi
2
2
2
2
2
6
2
1
Define any arbitrary linear estimator of  as ~   (  Xwi )Yi , where
N
wi  ci  d i , (d i : any arbitrary constant)
for unbiasedness,
w
i
 0, and
w X
i
i
 1,   d i  0,
d X  d x
i
i
i
i
0
 ei  X w e   ] 2
 Var (~ )  E[~  E (~ )] 2  E[ 
 ii
N
e
= E[
i
N
=
2
N
 X  wi ei ]  E[
2
( ei ) 2
N
2
]  E[ X ( wi ei ) ] 
2
2
2
N
 X 2 2  wi2
 X 2 2 ( ci2  d i2 )  var(ˆ )  X 2 2  d i2
Ie Var(~)  Var(ˆ ) , only when d i =0, Var(~)  Var(ˆ )
̂ has Min. Variance.
1
~
* Cov(ˆ , ˆ )  E{[ˆ  E (ˆ )][   E ( ˆ )]}  E[ (  X ci )ei   ci ei ]
N
= E[
e c e
i
i i
N
 X (  c i ei ) 2 ] 
 2  ci
 X2

2
2 N x
ˆ
Variance  coVariance matrix (  )  ˆ
 X

2
  x
7
N
X
X 

 x2 
1 

 x 2 
2
x
2
i

 2 X
 xi2
C.
S 2  ˆ 2 
1
eˆi2 is an unbiased estimator of  2

N 2
 2  E(ei ) 2 ,  eˆ 2 =ESS;   and  to be estimated , the d.f. =N-2 ( or T-K)
 eˆi  Yi  Yˆi  Yi  ˆ  ˆX i
<PROVE> in deviation form: yˆ i  ̂xi
 eˆi  yi  yˆ i  yi  ̂ xi
= xi  (ei  e )  ˆ xi
= (  ˆ ) x  (e  e )
i
i
eˆi  [(   ˆ ) xi  (ei  e )] 2
2
  eˆi2  [(  ˆ ) 2 xi  (ei  e ) 2  2(  ˆ ) xi (ei  e )]
= (  ˆ ) 2 x 2  (e  e ) 2  2(   ˆ ) x (e  e )]
2

i


i
A
B
For C:
i
C
For A: E [(   ˆ ) 2  xi2 ]   xi2 E ( ˆ   )   xi2 
For B:
i
2
x
2
i
2
1
( ei2 )]
N
1
=E [e12  e22    e N2 ]  E[(e1  e2    e N )(e1  e2    e N )]
N
1
= N 2  ( N 2 )  ( N  1) 2
N
x i ei
(  x i ei ) 2

ˆ
(    )  x i ( ei  e ) 
(  x i ei  e  x i ) 
 ( ˆ   ) 2  xi2
2
2
 xi
 xi
E [ (ei  e ) 2 ]  E[ ei2 
2
E[ (   ˆ ) xi (ei  e )]  E ( ˆ   ) 2  xi2 
 xi2
x
2
i
2
E ( ei2 )  E( A)  E( B)  E(C)   2  ( N  1) 2  2 2  ( N  2) 2
E( S 2 ) 
1
1
E ( eˆi2 ) 
( N  2) 2   2 ,  S 2 is unbiased
N 2
N 2
8
S: standard error of regression, SER;
or standard error of the estimate, SEE; or estimation
standard deviation of e.
Sˆ , S ˆ : standard error of the estimation of coefficient
1
1
1
E ( eˆi2 ) 
( y i  ˆxi ) 2 
( y i2  ˆ 2  xi2  2ˆ  xi y i )

N 2
N 2
N 2
1
( y i2  ˆ  xi y i )
=
N 2
S2 
D. if Yi ~ N (  X i ,  ) , then  ~ N ( ,
2
 2  X i2
x
2
i
) , ˆ ~ N (  ,
2
x
2
i
)
 ˆ and ˆ are linear combination of independently normal variances, Y1 , Y2 , , YN
 ˆ and ˆ must be normally distributed.
 H 0 :   0, H1  0,
  0.05
ˆ   0
ˆ  

~ N (0, 1), if  ˆ is known

 ˆ
The test statistic: Z C 
x
ˆ   0
tC 
S ˆ
Interval estimates: ˆ  t
N  2,
~t
N  2, 
S ˆ

2
if  ˆ is unknown
and
ˆ  t
2
N  2,

S ˆ
2
E. Descriptive properties (some mathematic characteristics of LSE)
Yˆ  ˆ  ˆ X …… regression,
Y  Yˆ  eˆ  ˆ  ˆ X  eˆ
1.
Prove:
2.
Prove:
 eˆ  0,
 eˆ   ( y
i
i
i
 eˆ  0
 ˆx ) 
 eˆ X  0,
 eˆ X   eˆ
i
i
i
 eˆ x   ( y
i
i
y
eˆ  Y  Yˆ ……calculated residual
i
  Xeˆ  0
 ˆ  xi  Ny  ˆ  x  0
this is a good property.
( X  X  X )   eˆxi   eˆX   eˆx i ,
i
(x  y  0)
 xi y i x 2  0
 ˆ xi ) xi   xi yi  ˆ  xi2   xi yi 
 i
 xi2
9
Account to 1. & 2. 
ie.
X and eˆ are orthogonal , ie. X ' eˆ  0
2T T 1
X and eˆ are linearly uncorrelated
Must be to estimate (find)value of ˆ and ˆ , when X and eˆ uncorrelated.
3. Y  Yˆ
PROVE:
Y  (Yˆ  eˆ )  Yˆ   eˆ

Y 


i
N
 eˆ
4.
i
Yˆi  0,
i
i
N
  Yˆeˆ  0
i
i
N
Yˆ


i
N
 Yˆ
ie. If  Yˆ eˆ  0 , can increase the performance of
regression.
PROVE:
F.
  eˆi Yˆi   eˆi (ˆ  ˆX i )  ˆ  eˆi  ˆ  eˆi X i  0  0  0
Goodness of Fit
( Yi  Y )  ( Yi  Yˆi )  ( Yˆi  Y ) explained deviation of Y
Total deviation of Y
unexplained deviation of Y
2
 ( Yi  Y )   ( Yi  Yˆi ) 2   ( Yˆi  Y ) 2
TSS
ESS
RSS
 (Yi  Y ) 2  (Yi  Yˆi ) 2  2(Yi  Yˆi )(Yˆi  Y )  (Yˆi  Y ) 2
 (Y
i
 Y ) 2   (Yi  Yˆi ) 2  2 (Yi  Yˆi )(Yˆi  Y )   (Yˆi  Y ) 2   (Yi  Yˆ ) 2   (Yˆi  Y ) 2
10
2. R2 the coefficient of determination
ie. The R squared of the regression equation
Define: R 2 
RSS
ESS
 1
TSS
TSS
RSS

 R 
TSS
2
 (Yˆ  Y )
 (Y  Y )
2
i
i
 yˆ

y
2
i
2
( ˆ x )


y
i
2
i
2
i
2
 ˆ 2
x
y
2
i
2
i
var( X )
 ˆ 2
var(Y )
RSS
ESS
 eˆi
R 
 1
 1
TSS
TSS
 yi2
2
2
(1) R2 is a measure of the goodness of fit of the regression model
ie. Ex. R2=0.959, measure the proportion (95.9%) of the variation in Y which is explained by the
regression equation;
ie how regression model fits the data
(S or ̂ ) ( SEE, standard error of estimate), is one kind measure but the problem is that it depends
on the unit of measurement used.
If unit is changed, we will get different estimate. R2 is
independent of unit of measurement.)
** when use R2 , must be keep follow condition.:
(a) The estimator must be an OLS estimator.
(b) The relationship be estimated must be linear.
(c) The linear relationship that is being estimated must include a constant or intercept term.
(2) 0  R 2  1
R2=0, the model can’t explain the variation in Y
R2 =1, the case of perfect fit in Y. (special case)
11
(3) R2 is higher value when used the time series data; R2 is lower when used the cross-section
data.
(4) R2 can be used for analyze the causality (relationship between of X and Y)
(5) If multiple regression, R 2  as independ var iable 
 independ var iable , TSS fixed , but the RSS ↑, ie ESS ↓.
So We use R 2 ( the adjusted coefficient of determination)
It takes into account of d.f. And may be constant as k↑
1 R 2 
ESS / N  K
TSS / N  1
ESS/N-K ; the residual variance, TSS/N-1: the variance in Y
Var (eˆ)
S2
=

Var (Y ) Var (Y )
R 2  1 (
ESS / N  K
ESS N  1
N 1
)  1

 1  (1  R 2 )
,
TSS / N  1
TSS N  K
N K
K  k 1
As K↑→(N-K) ↓→ R 2 ↓
As K↑→ TSS fixed, RSS↑,
so R 2 is not necessary↓ as K↑, may be offset
* if K↑: S2 ↓,-, ↑ so the R 2 ↑,-, ↓
* R 2 can be negative , the condition is 1  (1  R 2 )
12
N 1
0
N K
ie. 1  (1  R 2 )
N 1
NK
3. testing the regression equation:
critical value
Fd . f .1,
d . f .2 , 
FC 
exp lained var iance
RSS / 1

, if Y    X
unplained var iance ESS / N  2
 y / 1  ˆ  x
eˆ / N  2
ˆ
=
2
i
2
2
i
2
i
2


RSS N  2 RSS / TSS N  2



ESS
1
ESS / TSS
1
R 2 /1
(1  R 2 ) / N  2
FC =0: No the explained power of the independent variance in regression for dependent variance.
The value of FC is getting large:
(1)
t N2 2,   F1, N 2, 
 tC 
If
ˆ  
S ˆ
β=0,
with null hypothesis: H0: β=0
ˆ  

S
x
t
the relationship between X and Y is very closely.
2
N 2

2
ˆ 2  x 2
 eˆ
2

N 2
RSS / 1
= F1,
ESS
N 2
N 2
(2) The F testing can be used the joint hypothesis test: (multi-variables regression equation )
H0: all β=0
(not include β0 )
H1: not all β=0
ˆ 2  x 2
ˆ
0
* as   0, ˆ  Y  X , but don' t happen ˆ  Y ,  F 
ˆ 2
13
G. maximum likelihood estimation
Yi    X i  ei , ei ~ N (0,  2 )  Yi ~ N (  X i ,  2 )
1
f (Yi ) 
2
2
1
exp[ 
2
2
(Yi    X i ) 2 ]
frequency function, f (Yi ,  ,  ,  2 )
L= f(Y1) f(Y2) … f(YN) …………………likelihood function
N
1
i 1
2
=
2
Log L = 

1
exp[ 
2
2
(2 2 )
N
1
log( 2 2 ) 
2
2 2
 log L
2


2 2
 (Y
i
 (Y
~
ie ,  ~ MLE  Y   X  ˆ OLSE
 (Y
i
i
N
2
exp[ 
1
2
2
 (Y
i
   X i ) 2 ]
   X i ) 2
Y      X
   X i )  0
 N  Yi    X i
 log L
2
 2

2
1
(Yi    X i ) 2 ] =
i
i
0
~  Xi

N
N
~
If    ˆ OLSE , then ~  ˆ OLSE )
 ~ 
( ie.
Y
i
~
  Yi X i  ~ X i    X i2  0
   X i ) X i  0
~  Xi
~

) X i    X i2  0
N
N
~
Yi  X i  ( X i ) 2 ~

  Yi X i 

   X i2  0
N
N
  Yi X i  (
Y
i
N  X i Yi   X i  Y
~
  MLE 
 ˆ OLSE
2
2
N  X i  ( X i )
 log L
N
1


2
2

2
2 4
~ 2
MLE
=
1
N
ie. ~ 2
 (Y
i
MLE
 (Yi    X i ) 2  0
1
~
 ~   X i ) 2 =
N
 (e )
2
i
2
is biased estimate of ~ 2
14

OLSE
N 2 1
  (Yi    X i ) 2  0
2
2

1
2
ei

N 2
Download