Exogeneity, causality and autonomy Ragnar Nymoen 17 April 2009 Department of Economics, UiO

advertisement
Exogeneity, causality and autonomy
Ragnar Nymoen
Department of Economics, UiO
17 April 2009
ECON 4610: Lecture 12
Overview
Many of the topics of this course “turns on” the concept of
exogeneity:
The choice of estimator, when to use OLS, when to use 2SLS
or other instrumental variables methods
The relevant model for evaluating a shock (policy driven or
extraneous) to the economy: Derive multipliers from a single
equation, or from a larger model.
The identi…cation issue.
In this lecture we review the econometric concepts of
exogeneity, and also explain the relationship to two other
important concepts in econometric methodology: causality
and autonomy.
Syllabus:
G Ch 4.1
B 2.2, 6N: A
K Ch 10,11
ECON 4610: Lecture 12
“Classic” exogeneity concepts
Three di¤erent concepts are used in linear models with stochastic
regressors. Consider
yt D
1
C
2 xt
C "t , t D 1, 2, ..., T .
where we use time-series notation because that notation is most
relevant when we extend the discussion to causality. Exogeneity of
xt can refer to one of the following de…nitions:
1
E ["1 jxt ] D E [" 2 jxt ] D ... D E ["T jxt ] D 0, see A3 in Table
2.1 in Greene.
2
E ["t ] D 0 for t D 1, 2, ..., T and Cov ."j , xk / D 0 for
j, k D 1, 2, ..., T .
3
E ["t ] D 0 for t D 1, 2, ..., T , and " D ." 1 , " 2 , ...., "T /0
stochastically independent of x D .x1 , x2 ,...,xT /0 .
ECON 4610: Lecture 12
Relationships between the classic concepts.
#3. H) #1. H) #2.
The stochastically independency part of #3. is a strong
assumption. It implies that E [" jxt ] D 0 for all t, and
therefore de…nition #1.
Note also that the implication that the conditional means of
independent variables works “both ways”: #3. implies
E [x1 j"t ] D E [x2 j"t ] D ... D E [xT j"t ] D 0, as well.
Hence according to #1. xt is uncorrelated with all
disturbances, past and future: the covariance part of
de…nition 2. is implied by #1.
Moreover, E ["t ] D 0 implied by double expectation.
Most textbooks choose #1, but clearly these concepts are
closely related in practice: Heuristically, the common
assumption is about “unrelatedness” between the explanatory
variable and the disturbances.
ECON 4610: Lecture 12
An extension of the classic concepts: pre-determinedness
When xt in
yt D
1
C
2 xt
C "t , t D 1, 2, ..., T .
is uncorrelated with "t , "t C1 , ..., i.e. the current and future
disturbances, but not the past disturbances " t 1 , " t 2 , .... we
say that the explanatory variable xt is pre-determined.
In the discussion of identi…cation, there is no di¤erence
between pre-determined variables and exogenous variables.
The di¤erence between exogeneity concept #1, and
pre-determinedness, has to do with estimation properties:
OLS now has a …nite sample bias: remember
P
b2 D 2 C P.xt x/"t
.xt x/2
and that the the expectation of the bias term cannot be shown
to be zero.
ECON 4610: Lecture 12
Pre-determinedness, small sample bias and consistency
The classic case of pre-determinedness is when the
explanatory variable is yt 1 (or higher order lags). Then yt
is necessarily correlated with the past disturbances:
yt
1
D
1 .1 C
2
C
2
2
C ../ C "t
1
C
2 "t 2
C
2
2 "t 3
1
C ...
but not with the future disturbances, assuming no
autocorrelation among the disturbances.
Then the OLS estimator b2 is consistent (the “plim” in the
numerator is then zero).
What drives the consistency result, is that when there is no
autocorrelation, each new observation of yt 1 will contain
some new and unique information and asymptotically, this will
dominate and drive the bias term towards zero.
ECON 4610: Lecture 12
The lagged regressor case: how large is large?
For the model
yt D
2 yt 1
C "t ,
"t
N.0, 1/, t D 1, : : : T .
the following has been established in the literature:
Function Asymptotic Finite sample
E [b2 ]
2 2 /.T
2
2
Var [b2 ]
0
.1
2
2 //T
Which can be used to assess the size of the bias:
Sample
T D 51
0.5
T D 101
T D 51
0.9
T D 101
2
Bias
0.02
0.01
0.036
0.018
ECON 4610: Lecture 12
1/
(1)
Use of the classic exogeneity de…nitions
The main use is with regard to “limitation of OLS”. Exogeneity in
the meaning of de…nition #1 is violated in the presence of
1
Measurement error in the explanatory variable.
2
Simultaneity.
3
Lagged regressor with autocorrelated disturbances:
yt
"t
D
D
2 yt 1
"t
1
C "t ,
C
t.
It has been established in the literature that
plim b2 D
2
1C
C
2
6D b2
ECON 4610: Lecture 12
Response to “lack of exogeneity”
1
IV/2SLS estimation.
2
Speci…cation of a simultaneous equation model and IV /2SLS
estimation:
as when the recursive system of Lecture 7 is replaced by a
simultaneous equations model
3
Estimate by GLS, or, re-specify the model by inclusion of
relevant explanatory variables to obtain model without
residual autocorrelation.
ECON 4610: Lecture 12
Weak, strong and super exogeneity
There is another group of exogeneity concepts that relate the
exogeneity status of an explanatory variable to the parameters of
interest, see de…nition iv) in Biorn’s note.
Consider two variables xt and yt . In Lecture 11 we de…ned the
likelihood function function for a single logistically distributed
variable. The likelihood function extends to the bivariate and
multivariate case, and to other distribution, the normal distribution
in particular.
Let and denote the parameters of the conditional PDF of yt
given xt , and the marginal PDF for yt respectively, so that the
likelihood function can be decomposed as
Lx1 ,..xT ,y1 ,..yT . , / D Ly1 ,..yT jx1 ,..xT . /Lx1 ,..xT . /
In general:
max Lx1 ,..xT ,y1 ,..yT . , /
,
max Ly1 ,..yT jx1 ,..xT . /
ECON 4610: Lecture 12
n
o
max Lx1 ,..xT . /
Weak exogeneity
In some important situations we will have the equality
max Lx1 ,..xT ,y1 ,..yT . , / D
,
max Ly1 ,..yT jx1 ,..xT . /
n
max Lx1 ,..xT . /
o
saying that the maximum likelihood estimators for based on the
joint likelihood and the conditional likelihood are identical.
In this case xt is weakly exogenous for .
An important requirement for weak exogeneity is that and are
“variation free”. This mean that the two parameter (vectors) can
vary freely within their respective logically admissible “spaces”.
Cross restrictions between and is an example of how variation
freeness may be invalidated.
Another important requirement, for weak exogeneity to be a
relevant concept, is that contains parameters of interest, for
understanding economic behaviour, or for forecasting economic
variables.
ECON 4610: Lecture 12
The bivariate normal model
In Lecture 1, we established that when the joint PDFs of fxt ,yt g
are normal and independent, we have
D E[yt j xt ] C "t D
yt
xt
where "i
D
N.0,
2/
x
C
with
1
2 xt
C
C "t
(2)
(3)
xt
2
D
2 .1
y
2 /, "
i
N.0,
2 /,
x
xy
D
x
y
and
y
1
D
y
2
D
xy
xy
y
x
D
x
x
D
y
xy
2
x
xy
2
x
ECON 4610: Lecture 12
x
(4)
(5)
Exogeneity in the binormal model
xt is exogenous according to de…nition #1 above:
E ["t j xt ] D E [yt j xt ]
E [E [yt j xt ]] D E [yt j xt ]
E [yt j xt ] D 0.
and note it is not uncommon to refer to this as strict exogeneity.
This holds by construction, since "t only contains the part of yt
that is unexplained by xt . For the same reason, "t and xt in (2)
and (3) are also uncorrelated:
E [" t
xt ]
D E [.yt
1
D E [.
yt / xt ]
y
2 xt / xt ]
2
D E [yt
2
x
D
xt ]
xy
E[
xy
2
x
2 xt xt ]
2
x
D 0.
This suggests that in terms of maximum likelihood estimation,
which involves minimization of the sum of squared residuals,
nothing is lost by maximizing the conditional and marginal
likelihoods separately.
ECON 4610: Lecture 12
Exogeneity in the binormal model, cont’d
Hence we are in the situation that
max Lx1 ,..xT ,y1 ,..yT . , / D
,
max Ly1 ,..yT jx1 ,..xT . /
when the parameters of interest are contained in
xt is weakly exogenous for those parameters.
n
max Lx1 ,..xT . /
, meaning that
o
In the binormal case, there is only a subtle di¤erence between
the “classic” exogeneity and weak exogeneity.
But weak exogeneity brings out that in econometrics, a
variable is not exogenous in itself, but relative to a statistical
model, and relative to the parameters of interest. This turns
out to be a big advantage in more complicated modelling
settings.
The discussion can be extended to a dynamic setting: We can
then retrieve the assumption that we started with above: with
independent xt s and yt s, by …rst conditioning on the lagged xs
and y s.
ECON 4610: Lecture 12
Strong exogeneity and Granger non-causality
Consider again the bivariate case. xt is strongly exogenous if it is
1
2
Weakly exogenous and
not Granger-caused by yt .
Granger causality can be discussed with reference to the following
dynamic equation (…rst order dynamics for simplicity):
xt D
10
C
11 xt 1
C
12 yt 1
C
x ,t
Granger causality means 12 6D 0, so strong exogeneity requires
12 D 0 (Granger non-causality).
Strong exogeneity is required for valid forecasting from the
conditional model
yt D E[yt j xt, xt
1 , yt 1 ] C " t
i.e. if we make forecasts for y on the false assumption that there is
no feed-back e¤ect of y on x, the forecasts will not be optimal and
the prediction intervals will be misleading.
ECON 4610: Lecture 12
Invariance (to structural changes)
The parameters (of interest) in the conditional model are
invariant if they are una¤ected by a change in the parameters
of the marginal model.
The binormal model is again a useful reference: For example
2 is invariant if a change in x or x does not change 2 .
Since 2 D xy yx , 2 is invariant to changes in x (a change
in the level of the explanatory variable). With respect to a
change in the standard deviation of x, i.e. x , 2 may or may
not be invariant, depending on how y is a¤ected by the
structural change to x .
Invariance is important for the validity of policy analysis based
on a regression model (the structural change is then brought
about by a change in a policy instrument, or in legislation)
ECON 4610: Lecture 12
The Lucas critique
Consider a bivariate model (without intercept for simplicity)
yt
xt
D
D
e
2 xt
C "t
11 xt 1
C
(6)
xt
(7)
where xte denotes expectations. It is straight-forward to show that
if we regress yt on xt we obtain
plim b2 D
2
11 2
which is biased, and not invariant to changes in the marginal
model for xt .
This is the famous Lucas critique: Policy analysis based on
conditional (regression) models are not valid when there are
changes in the expectations formation process, represented by (7)
above.
ECON 4610: Lecture 12
Super exogeneity
An explanatory variable xt is super exogenous if it is
1
Weakly exogenous and
2
the parameters (of interest) of the conditional model is
invariant to (structural) changes in the marginal model of yt .
As mentioned: super exogeneity secures the validity of policy
analysis with a conditional model. The parameters of interest
is then the slope coe¢ cients (the derivatives or elasticities).
In a wider interpretation, we seek econometric models that are
invariant to a wide range of potential structural breaks.
Haavelmo (1944) coined the term autonomous relationships,
and pointed to the dangers of not paying enough attention to
potential sources of structural breaks and lack of invariance.
Hence the Lucas critique is a special case of the
Haavelmo-critique.
ECON 4610: Lecture 12
Testing exogeneity
Weak exogeneity: The Wu-Hausman test of Lecture 8.
Strong exogeneity: Specify an econometric equation the
marginal model for xt and test the signi…cance of yt 1 (or
higher order lags).
Invariance: If structural breaks in the marginal model for xt
can be established (use eg the tests in Lecture 2), and the
dummies that represent these breaks are insigni…cant in the
conditional model, then the Lucas/Haavelmo critique does not
apply, and invariance with respect to these breaks are
maintained.
ECON 4610: Lecture 12
Testing exogeneity via inverted regressions
Under the property of super exogeneity, the results for a regression
model are not invariant to re-normalization. From OLS algebra, we
have:
2
b2 b2 D ryx
where b2 is the OLS estimate on the slope coe¢ cient when yt is
the dependent variable, and b 2 is the estimate when xt is the
dependent variable (the inverse regression).
2 will have
If there are structural changes in the sample period ryx
changed.
Consequently, both b2 and b 2 cannot be stable over the sample
period. But any one them can.
Investigate this by recursive estimation of both models.
2 then interpreted
This extends to any number of variables–with ryx
as partial correlation coe¢ cients.
Example: Conventional Phillips curve, versus Lucas’supply curve.
ECON 4610: Lecture 12
A typical wage Phillips curve:
1wt D
1
C
2 ut
C
3 1pt 1
C .. C "t
where wt is log of the wage rate, ut is log of the rate of
unemployment and pt is log of the price level.
Lucas’s supply curve entails a negative relationship between ut
and in‡ation (as meassured by eg 1pt and 1wt ), which is due
to short-term misperceptions of changes in relative prices.
It implies that inversion of the conventional Phillips curve may
be more stable than the conventional Phillips curve
Investigates this for Norwegian data.
ECON 4610: Lecture 12
Stability of a Phillips curve for Norway
∆ pt−1
Intercept
∆qt
0.5
0
+2σ
0.5
+2σ
+2σ
0.0
β
β
β
-1
−2σ
1980
∆qt
1990
−2σ
2000
1980
1990
2000
1980
tu t
IP
0.1
+2σ
0.0
β
−2σ
1990
2000
t
0.00
1
0
−2σ
0.0
+2σ
+2σ
-0.05
β
−2σ
β
-0.10
−2σ
-0.1
1980
1990
2000
0.050
1.0
1980
1% critical value
1990
2000
1.0
1980
1% critical value
1990
2000
+2 s e
0.025
1-step residuals
1 period ahead Chow statistics
0.000
0.5
0.5
Break point Chow statistics
-0.025
-2 s e
1980
1990
2000
1980
1990
2000
1980
1990
ECON 4610: Lecture 12
2000
Instability of the inverted Phillips curve
Intercept
10
+2σ
β
-2 σ
-2
-3
∆ pt−1
∆qt
10
+2 σ
5
0
β
0
+2 σ
-1 0
β
-4
-2 0
1980
1990
∆ q t− 1
10
β
0
-2 σ
1980
1990
2000
∆ w c t − ∆ pt−1
10
+2 σ
5
-2 σ
-2 σ
-5
1980
2000
1
0
IP
1990
0
-1 0
+2 σ
β
-2 0
+2 σ
-1
β
-2 σ
1980
1990
1 .0
2000
1 .5
1980
1990
1980
2000
1990
2000
Break point Chow s tatis tics
1% critical value
1 .0
2
1 period ahead Chow statis tics
0 .0
0 .5
-0 .5
-2 σ
3
+2 se
1-step res iduals
0 .5
2000
t
1% critical value
1
-2 se
1980
1990
2000
1980
1990
2000
1980
1990
ECON 4610: Lecture 12
2000
Download