Chapter 7 Instrumental Variables and Two-stage least... (simultaneous equations model)

advertisement

Chapter 7 Instrumental Variables and Two-stage least squares

(simultaneous equations model)

A.

stochastic regressors :

suppose of SLRM: X’s are constant (ie. X’s are fixed in repeated samples)

→ independent variables are uncorrelated with disturbance terms.

If X’s are variables, then the X’s are stochastic .

1.

case 1: X and e are independent, Cov(X, e)=0

 ˆ

OLS

is unbiased, efficient, consistent, asymptotically unbiased and efficient; but

 ˆ

OLS no longer be linear.

(X are stochastic)

BLUE

2.

Case 2: X and e are contemporaneously uncorrelated, ie. Cov (X

1

, e

1

) = Cov (X

2

, e

2

) = ... = Cov (X

T

, e

T

) =0 but could be Cov (X

1

, e

2

)

0

→  ˆ

OLS

is biased, inefficient, consistent, asymptotically unbiased and efficient ( at the big sample, can estimate the

 ˆ

OLS

)

3.

Case 3: X and e are neither independent nor contemporaneously uncorrelated.

Cov(X, e)

0

→  ˆ

OLS

is biased, inefficient, inconsistent, even if N

 

.

→ use IVE (instrumental - variable estimate) to increase consistency, but it doesn’t guarantee unbiased estimation. a.

ex. X and e are positively correlated, Cov(X, e)>0

High value of Y is due to high value of X.

If the e is large in particular period

The dependent variable is also large in that period

1

 ˆ

OLS

  ˆ

OLS

 xy x

2

  

 xe x

2

 

will tend to overestimate no matter what T. (

 ˆ 2

will tend to underestimate , ie. OLSE line fits the observation better than the true relationship),

 ˆ

OLS

is biased, inconsistent. b.

test

H

0

: X ' e

0 , H

1

X ' e

0 ,

Method 1: if Y

X

  e , ( matrix ) , Z: instrumental variable.

(1) regress X on Z to get X

ˆ

(2) OLS: Y

X

 

X

ˆ   e

(3) Test H0

test

 

0

B.

error in variables:

∵ incorrectly measured variables, or use proxy variable

1.

dependent variable is measured with error:

 ˆ

OLS

is unbiased and consistent

but in efficient.

<Prove> suppose the true model: y t

  x t

 e t

, t

1 , 2 ,..., T

the actual model: y t

*   x t

 e t

  t

(estimated) =

 x t

  t

ie. assume y t

*  y t

  t with Cov ( e t

,

 t

)

0 ,

~ N ( 0 ,

2

)

and Cov ( x t

,

 t

)

0

 Var (

 ˆ

OLS

)

 e

2 x t

2

if dependent variable is measured without error.

 Var (

 ˆ

OLS

)

2 x t

2

if dependent variable is measured with error.

and  Cov ( e t

,

 t

)

0 ,

 

2

  e

2  

2

 error var iance

2.

independent variable is measured with error:

2

 ˆ

OLS

is biased and inconsistent ( The greater

2

, The greater the bias and inconsistence)

use IVE to increase consistency.

<Prove> if the true model : y t

  x t

 e t

, t

1 , 2 ,..., T

but use x

* t

 x t

 v t

to run a regression, with x , e )

 v ~ N x ,

( 0 , v )

 2 v

) e , v )

ie. the estimated model: y t

  x t

* 

( e t

  v t

)

 Cov ( e t

* , x t

* )

=E[

E [( e

 e t x t

 v t

)

( x t e t v t

  vx

 v t

)]

 v t

2

]=

  v

2

  ie. x * t

is correlated with e t

*

∴  ˆ

OLS is biased and inconsistent. x t

*  e t

*

3.

dependent and independent variables are measured with error.

 ˆ

OLS

is inconsistent.

<Prove> if the true model : y t

  x t

 e t

, t

1 , 2 ,..., T

but use y t

* x

* t

 y t x t

  t v t

with no correlation between with e t

,

 t v

~

~ N ( 0 ,

2

)

N ( 0 ,

2 v

)

, v t

, and x t

 ˆ

OLS

The estimate model:

=

 x t

* x

 x t

2 t

2 y t

*

 x t

( x

 t

 v t

(

 y t v t x t x t

2

*

)(

 y v x t t t

 x e t

)

2 t

*

 v t

2

 t

)

( e t

2

 v t

 e t

 t

( x t

 v t

)

 v

 t x t

( t

)( x t

 x

 t v x t v t t

 v t

)

2 e t

 t

  t

)

[ ∵ x t

, v t are all stochastic ,

 difficult to take evalue]

∵ p

T lim

 

(

 ˆ

OLS

)

 p lim(  x t

2

/

T

 x t

2

/

T v t

2

/ T

)

  var( x ) var( x )

 var( v )

1

 var( v ) var( x )

 

(underestimate)

∴  ˆ

OLS

is inconsistent

( Using the matrix notation to prove that

4.

proxy variables:

 ˆ

OLS

is biased downward and inconsistent )

3

 ˆ

OLS

is biased and inconsistent

<Prove> Friedman- permanent Y’s consumption theory

Assume (1) Y

1

Y

Y e

where Y

1

: measured Y (with moving average or liquid asset)

Y : permanent Y ( Y p

) or the wealth

Y e

: random error ( tranxitory component )

(2) C

1

C

C e

(3) E(Y e

)=E(C e

)=0

(4) C

 

1

Y

 

( in deviation form , the true model)

(5) Y , Y e

, C , C e

, and

are uncorrelated

The estimated model: C l

 

1

Y

1

(

 

C e

 

1

Y e

)

as Y and C are measured with error or use proxy variable.  Cov ( Y

1

, Y e

)

0

ie. independent variable and disturbance are correlated,

  ˆ

1

OLS

is biased & inconsistent.

proved that

 ˆ

1

OLS is biased downward and inconsistent

the true

 ˆ

1

will be higher, to support the Friedman theory.

The

 ˆ

1

OLS is significant that use the Proxy variable to estimate the estimate model, but it is a incorrect outcome.

【 Prove 】 

 ˆ

1

O L S 

=

=

C

1

1

1

1

Y

1

Y

1

2

Y

1

C

1

,

 w h e r e

Y

1

(

1

Y

1

Y

1

2  

Y

1

 

Y

1

2

( Y

Y e

)

  

Y

1

C

 e

Y

1

C

C

( Y

Y e

Y

1

2

) C e

1 e e

1

Y e

1

Y e

Y

1

Y e

1

)

( Y

Y e

) Y e

E (

 ˆ

1

O L S

)

 

1

 

1

2  

1

, The biased degree depends on variance.

∴ p

T lim

 

(

 ˆ

1

)

 

1

1

2

Y

1

2 y e

 

1

(

2

Y

1

2

Y

1

2

Y e )

 

1

 ˆ

1

OLS

is biased downward and inconsistent (in proxy variance case)

4

C.

instrumental variable approach: (in the case of independent measured with error.)

if the true model : but use x

* t

 x t y t

 v t

,

 ie .

x t x

 e t

, t

1 , 2 ,..., T t

 x t

*  v t

ie. the estimated model: y t

  x t

*  e t

*

<Prove>

  ˆ

IV 

 z t y t z t x t

*

z: IV (z is closed correlated to x* , but uncorrelated with e*)

=

 z ( t

 x z t t

* x t

*

 e t

*

)

  x t

* z

 t

 z t x t

 e t

* z t   

 e t z t z t x t

IVE:

if

T

Y

1

T

X

K

K

1

T e

1

………… The actual (estimated) model

IV:

T

Z

K

(a)

(b) p lim p lim

Z ' e

T

Z ' X

T

0 ( Z is uncorrelated with e*)

0 ( Z is closed correlated to x*)

Z ' Y

Z ' X

 

Z ' e

…………………..(C. 1)

  var( Z ' e )

E ( Z ' ee ' Z )

 

2

Z ' Z

 

2

I

∴ GLSE for equiation (C. 1) :

 ˆ

IV 

( Z ' X )

1 Z ' Y ex. in the case of independent variable measured with error:

if the true model : y t

  x t

 e t

, t

1 , 2 ,..., T

IV: z t

: ( 1 ) cov( z t

, e t

)

0 , ( 2 ) z t and x t

*

is highly correlated.

 ˆ

IV 

 z t z t y t x t

  ˆ

O L S 

 x t x t

2 y t

5

For 1 st obs .

2 nd z

1 y

1

  z

1 x

1

*  z

1 e

1

* z

2 y

2

  z

2 x

*

2

 z

2 e

2

*

Tth obs .

sum

T

1

.

z

T y

T

  z

T x

T

*  z

T e

T

*

T

 zy

1

 

T

1 zx

T

 ze

1 ie .

Cov ( z t

, y t

)

  ˆ

IV 

Cov ( z t

, x t

*

)

Cov ( z t

, e

*

)

  ˆ

IV 

Cov ( z t

, x t

*

)

 ˆ

IV 

Cov ( z t

, y t

Cov ( z t

, x t

*

)

) p

T lim

 

(

 ˆ

IV

)

 p

T lim

 

(

 

 ez zx

*

/

/

T

T

)

 

Help to increase the consistent

6

D.

simultaneous equations models

ex: demand-supply model of one agricultural product

Q

D  

1

 

1 p

 

1

I

 e

D …… demand curve

behavioral (or structural) equation

Q

S  

2

 

2 p

 

2 p t

1

 e s …… supply curve behavioral (or structural) equation

Q

D 

Q

S 

Q

* ……..equilibrium condition

The structural form: each equation has an independent meaning and identity

The structural coefficient( or constant)→ direct effect

The reduced form : it expressing the endogenous variables in each equation solely as a function of the exogenous (or predetermined) variable and stochastic disturbance term

The reduced coefficient → direct effect + indirect effect ( 均衡解 )

Final form: endogenous var. = fun.(exogenous variables, or determined variables)

7

* SEM Simultaneous Equation Model

M equations M endogenous var.: Y

1

, Y

2

, ..., Y

M

K predetermined var.: X

1

, X

2

, ..., X

K

Disturbance for each equation: e

1

, e

2

, ..., e

M

The system of equations :

11

Y

1

 

21

Y

2

...

 

M 1

Y

M

 

11

X

1

 

21

X

2

...

 

K 1

X

K

 e

1

0

12

Y

1

 

22

Y

2

...

 

M 2

Y

M

 

12

X

1

 

22

X

2

...

 

K 2

X

K

 e

2

0

1 M

Y

1

 

2 M

Y

2

...

 

MM

Y

M

 

1 M

X

1

 

2 M

X

2

...

 

KM

X

K

 e

M

0

Y

[ Y

1

, Y

2

,  , Y

M

]

M

1

 

11

M 1

12

...

 ...

M 2

...

1 M

MM

M

M

 

11

M 1

12

...

 ...

M 2

...

1 K

MK

M

K

E=[ e

1

, e

2

, ..., e

M

]’

The structural form:

Y

 

X

E

0 identification problem: whether numerical estimates of the parameters of a structural equation can be obtained from the estimated reduced-form coefficients.

(1) if it can be done, call identified ;

(2) if it can’t be done, call unidentified , or underidentified.

If Y

X

  e with multicollinearity problem, rank(X)≠K

Suppose

*    

C , λ: a arbitrarily constant

X

  e

*

*  

,

X

 but

 

XC

 e

Y

X

*  e

* b o t h s a m e

→ it can’t to differentiate between

→ unidentified.

*

, or

8

a.

definitions:

(1) Since the reduce form parameters summarize all relevant information available from the sample data.

* A system of structural equations can be identified, Iff, the each equation of the system is identified.

* An identity equation must be identified.

(2) Given the reduced form parameters (a posterior information)

If there are many different structural form parameters consistent with the posterior information, the system is not identified b.

unidentified system ex. Q D  

1

 

1 p

 e D

Q S  

2

 

2 p

 e s

Q D 

Q S 

Q * reduced form(Q, p)= (

2

1

2

1

1

2 ,

1

2

1

2 )

(

2 e

D

2

1 e

S

1

,

  ˆ

(

11

,

12

)

Too many term of value of

1

,

2

1

2

,

1

1

,

(

2

1

2

 

1

2

,

1

 

2

)

to solve the value of e

D

2

 e

S

 

1

)

 and

.

9

* The rule for identification

1. order condition (it is the necessary condition for identification)

K

1

* 

M

1

1

*

K : number of predetermined variables but not including 1

1 st

equation

M : number of endogenous variables in 1

1 st

equation.

If order condition is satisfied, then rank condition shall be satisfied, otherwise not to be sufficient.

If K

1

* 

( M

1

1 ) → underidentified

K

1

* 

( M

1

1 ) → just identified → use the indirect LSE to estimate parameters

K

1

* 

( M

1

1 ) → overidentified → use the 2SLS, 3SLS to estimate parameters.

(2 or 3 SLS is methods of two-stager or three-stager least squares )

2. rank condition: (it is the sufficient condition for identification )

The condition tells us whether the equation under consideration is identified or not, whereas the order condition tells us if it is exactly identified or overidentified. rank( (

11

,

10

)

M

1

1

10

The way to check rank and order condition for identification

(Step by step)

1.

To normalized the each equation in system

Set

11

=-1 for the 1 st

structural equation y

1

 

21

Y

2

...

 

M 1

Y

M 1

+

M

1

1

Y

M

1

1

...

1

Y

1

1

*

Y

1

*

M

Y

M

+

11

X

1

1

...

X

1

 

K

1

, 1

X

K 1

+

K

1

1 , 1

X

K i

1

1

*

X

1

*

...

 

K , 1

X

K

=

1

Y

1

+

1

*

Y

1

*

+

1

X

1

+

1

*

X

1

*

Y

1

: beside the y

1

var. all the endogenous variables in 1 st equation.

Y

1

* : all the endogenous variables, but not including 1 st equation.

X : the predetermined variables in 1

1 st

equation.

X

1

*

: all the predetermined variables, but not including 1 st

equation.

M

1

*

K

1

*

M

K

M

1

K

1

M: number of endogenous variables in the model (system).

M

1

: number of endogenous variables in 1 st

equation

K: number of predetermined variables

K

1

: number of predetermined variables in 1 st

equation

2.

listed the coefficients of structural equation for identifying the 1 st equation

 coeff coeff .

.

for for per endogenous det er min ed var .

var .

 

M

M

K

M

 coeff coeff

 coeff coeff coeff

.

.

.

.

.

on on on on on y

Y

1

1

Y

1

*

X

X

1

1

*

.....

.....

....

....

....

11

10

*

11

11

*

11

Rnak (

*

11

,

*

10

) = M

1

-1 iff. Rank (

 *

11

*

11

)=M-1

Here, the order condition is M

1

* 

K

1

* 

M

1

K

1

*

M

M

M

1

1

1

K

1

* 

M

1

11

Ex: Given the rank and order condition for identification, and then consider the three equation system.

Normalization:

Y

1

Y

2

Y

3

1

1

1

 

2

Y

2

2

Y

2

 

3

Y

3

 

4

X

1

 

5

 

5

X

2

X

2

1

2

3

For 1st equation :

M=3 (Y

1

, Y

2

, Y

3

), K=2 (X

1

, X

2

)

M

1

*

M

1

*

M

K

1

*

M

1

1

=3-2,

0

1

K

1

*

M

1

K

3

K

1

1

=2-2=0

2 .....

underidentifient

Y

1

Y

2

Y

3



1

1

Y

Y

Y

X

X

1

2

3

1

2

0

2

4

5

1 0

1

0

3

5

0

2

0

0

1



1

1

*

*

rank 

=rnak

3

1

=1<2

Unsatifed the rank condition.

For 2nd equation :

M

2

*

M

2

*

M

K

2

*

M

2

1

=3-2=1 ,

1

2

M

1

K

2

*

3

K

1

2

K

2

=2-1=1

...... ..

just identifient

Y

2

Y

3

Y

1



2

2

Y

Y

Y

X

X

2

3

1

2

1

 

0

0

3

5

1

2

1

0

0

0

0

2

5

4

1

rank 



2

2

*

*

= rnak

0

0

4

1

 = 0< 2

No satisfied the rank condition.

12

For 3th equation :

M

3

*

M

3

*

M

K

3

*

M

3

=3-2=1 ,

1

2

3

M

K

3

* 

K

K

3

=2-0=2

1

3

1

2 ...... ..

over identified

Y

3

Y

1

Y

2



3

3

Y

Y

Y

X

X

3

1

2

1

2

 

0

0

0

1

0

2

4

5

1

0

3

0

5

1

rank 



3

3

*

*

= rank



4

5

1 0

0

5

= 2= 2 satisfied the rank condition.

13

3 estimation Techniques: a. OLS (1) 只適用方程式右方無內生變數者;

(2) 若方程式右方含有內生變數,則 X’e≠0,  ˆ

OLS

is a biased, inconsistent estimator.

b. I LS (indirect least sequare)

→ 以 OLS 估計 each reduced form ,再以 coef. of each reduced form 求出 coef. of structured form.

If X’e≠0, the ILS can’t use. ( 當 predetermined var. 包括 前期內生變數且 AR( )

項 ) ( 使用 GLS 仍無法修正 ) c.

2SLS→ 用有限訊息方法去估計單一方程式 → using for over-identified model

1 st

step → 利用模型中所有先決變數進行迴歸估計,用 OLS 方法估計各解釋內生變數

的迴歸;求出個解釋內生變數之預期值為 IV

2 nd

step → 利用 1 st

step 所得到的 IV 與先決變數,以 OLS 方法估計各方程式內生變數

迴歸式;

( I ) 若各方程式的干擾項 distribution ,不存在序列相關,則

 ˆ

2 SLS

is consistent, asymptotically efficient.

(II ) 若先決變數含有前期內生變數 (lagged endog. Var.) ,且與干擾項有相關

時,

 ˆ

2 SLS

is biased, inconsistent, inefficient.

(III ) 2 SLS 適用於先決變數少於 20 個,且資料無問題。 d.

Zellner estimate→ 適用先決變數不含內生變數 (full-information method apply GLSE

對 SEM ) e.

3 SLS ( full-information method, estimate all equations of a system simultaneously)

在 2 SLS 之後再加入第三階段 GLSE 用來提升估計式的效率性

(I ) under large sample

(II )

 ˆ

3 SLS 較

 ˆ

2 SLS 有效率

(III ) 3 SLS 未必比 2SLS 來得好

14

Download