Word Document - Mysmu .edu mysmu.edu

advertisement
Econ107 Applied Econometrics
Topic 3: Classical Model
(Studenmund, Chapter 4)
I. Classical Assumptions
We have defined OLS and studied some algebraic properties of OLS. In this topic
we will study statistical properties of OLS estimators, which help justify the use
of OLS and also help to make statistical inference. To study statistical properties
of OLS estimators, we need to impose a set of assumptions, most of them on the
error term. When the first 5 assumptions are held, we call the error term a classical
error term. If Assumption 7 is added, it is called a classical normal error term.
Assumption 1: No specification error in the model. That is, the regression model
is linear in the coefficients, is correctly specified (has the correct variables and
functional forms), has no measurement error, and has an additive error term.
Yi =  0 +  1 X 1i     K X Ki +  i
Assumption 2: Disturbances have zero mean. The expected value of the
disturbance term is zero.
E ( i )  0
Assumption 3: All the independent variables are uncorrelated with the error term
(we say independent variables are exogenous). That is, for all i=1,…,n, j=1,…,K,
Cov(  i , X ji ) = E[(  i - E(  i )) (X ji - E(X ji ))]
= E[  i (X ji - E(X ji ))]
= E(  i X ji ) - E(  i )E(X ji ) = E(  i X ji ) = 0
Assumption 4: No autocorrelation between the errors. Given any 2 values of Xi
and Xj (where ij), the correlation between  i and  j is zero.
Cov(  i ,  j | X i , X j ) = E ( i j | X i , X j )  0, i, j
If this is true for time series data, we say errors are serially uncorrelated.
Cross-sectional data have less problem of correlation, but there are exceptions.
Assumption 5: Errors are homoskedastic. Given the value of Xi, the variance of
the disturbance is the same for all observations.
Var(  i | X i ) = E[  i - E(  i | X i ) ] 2 = E(  i2 | X i ) =  2
When this assumption does not hold, the error is said to be heteroskedastic.
Heteroskedasticity is traditionally believed to be an issue for cross-sectional data.
However, it may well be a problem in the time series context.
Assumption 6: No 'perfect multicollinearity' between independent variables. That
is, no explanatory variable can be written as a linear function of other explanatory
variables, e.g., the following equation cannot hold
X1   2 X 2    K X K
If the above equation holds, there is perfect multicollinearity in explanatory
variables. When K=2, we say X1 and X2 are perfectly collinear.
Suppose we have a 3-variable regression model,
Yi =  0 +  1 X 1i   2 X 2i +  i
If there is an exact linear relationship between the independent variables:
X 1i =  X 2i for some parameter 
Substitute it into original regression:
Y i =  0 +  1 X 2 i +  2 X 2 i +  i
=  0 + ( 1 +  2 ) X 2i +  i
=  0 + X 2 i +  i
This reduces to 2-variable regression. Single slope coefficient. Can't identify
separate effects of the independent variables.
Assumption 7: Errors are normally distributed. Combined with Assumptions 2,
4 & 5, we have  i ~ N (0,  2 ) . This assumption is important if we have a small
sample, otherwise it is not important.
Page 3
II. Further Details about Assumptions
Assumption 1: No specification error.
1. Linearity may be in disguise. Consider the following model
Y  e  0 X 1 e 
where e is the exponential. This model looks nonlinear, but can be transformed
into a linear form by taking a log both side
ln( Y )   0  1 ln( X )  
which is linear in  ’s with ln(Y) and ln(X) as dependent and independent
variables, respectively. Note that the following models are all linear:
ln( Y )   0   1 X  
Y   0   1 ln( X )  

Y
 0  1  
X
X
1
Y  1   2  
X
2. Correct specification also requires all relevant explanatory variables to be taken
in account. If the true model is
Yi =  0 +  1 X 1i   2 X 2i +  i
But you estimate the model
Yi =  0 +  1 X 1i +  i
Then your model is misspecified. We call this missing relevant variable
problem.
However, if you estimate the model
Yi =  0 +  1 X 1i   2 X 2i   3 X 3i +  i
Page 4
Then your model is also misspecified. We call this including irrelevant variable
problem.
3. In the case where you estimate the model with a different functional form, eg,
Yi =  0 +  1 X 1i   2 X 2i +  i
2
you also commit a specification error.
Assumption 2: Error has zero mean. This is a rather weak assumption.
What would happen if you estimate the following model
Y i =  0 +  1 X i +  i , E ( i )    0
We can introduce another error term  * so that
*   
As a result, the new error term has zero mean and the model becomes
*
Y i = ( 0   ) +  1 X i +  i
It implies that the estimated parameters are  0   and  1 . In other word, if the
parameter of interest is  1 , and a constant intercept is included in the model,
Assumption 2 is automatically satisfied.
Assumption 3: Explanatory variables are exogenous.
What might cause the violation of the assumption of E ( i X ki )  0 for some k?
We call this problem of the violation of exogeneity “endogeneity” or
“simultaneity”. Exogeneity occurs when the explanatory variable is determined
independently of the error term, that is, outside of the model. This assumption is
automatically satisfied if X is non-stochastic. However, if both independent and
dependent variables are simultaneously determined in the model, we have the
endogeneity problem. Let’s use the following example to illustrate how the
exogeneity assumption is violated. (note that the example used in the textbook is
not very clear.)
Page 5
Qd  P   1
where Q d is the quantity of demand for a good and P is the price. This model is a
demand function and in this model P cannot be exogenous. This is because P
cannot be determined outside of the model, ie, Q d and P are simultaneously
determined within the model. To see this, we have to examine how the price is
determined:
Qs   P   2
Qd  Q s  Q
The first of these two equations is the supply function. The second is the
equilibrium. Solving all three equations for P, we have
P
 2  1
 
Since P is a function of  1 , P must be correlated with  1 . Hence the exogeneity
assumption is violated.
Assumptions 1, 4, 5, and 6 will be discussed in subsequent Topics.
Assumption 7: Normality. This assumption is often justified according to the
Central Limit Theorem.
Central Limit Theorem: The mean (or sum) of a large number of independent and
identically distributed (iid) random will tend to be a normal distribution,
regardless if their distribution, if the number of such variables is large enough.
III. Sampling Distribution of ˆ
Need to know something about the ‘precision’ of our least-squares estimators.
The idea is that these coefficient estimates are a function of the sample data. They
will vary from sample to sample. We care about the ‘reliability’ of our estimates
of the coefficients.
For the SLR model, under Assumptions 1-6, OLS estimators are unbiased, that is,
Page 6
E(ˆ0 )   0 , E(ˆ1 )  1 .
Also we have the formulae for the variance and standard deviations of the OLS
estimators (proofs are available in more advanced econometric textbooks). For
the estimated intercept:
Var( ˆ 0 ) =
 X i2

n  x i2
2
 ( ˆ 0 ) =
 X i2

n  x i2
2
For the estimated slope coefficient:

Var( ˆ1 ) =
 xi2
2
 ( ˆ1 ) =

 xi
2
The variance of the error term can be estimated with:
̂ =
2
 ei2
n - 2
where n-2 is the number of degrees of freedom (df).
The square root is known as the Standard Error of the Estimate (or Regression).
It’s a common ‘summary statistic’ from regression analysis.
̂ =
 ei2
n - 2
Finally, note the non-zero covariance between the two coefficient estimates.
Since the variance is always positive, the nature of the covariance depends on the
mean of X.
Page 7
 2 
Cov( ˆ0 , ˆ1 ) = - X Var( ˆ1 ) = - X  2 
  xi 
If, in addition, Assumption 7 is also satisfied, we have the sampling distribution
for  :
ˆ
ˆ
0
~ N(  , Var ( ˆ ) )
1
~ N(  , Var ( ˆ ) )
0
1
0
1
The same ideas apply to MLR models. However, the expressions are more
complicated using non-matrix notation.
IV. Statistical Properties of OLS Estimators: Gauss-Markov Theorem
Gauss-Markov Theorem: Under Assumptions 1-6, the OLS estimators, in the
class of unbiased linear estimators, have minimum variance (i.e., they’re the best
linear unbiased estimator (BLUE)).
Won’t prove this proposition. The estimated coefficients and variance of the
disturbances are unbiased:
E( ˆ0 ) =  0 E( ˆ1 ) =  1 E( ˆ 2 ) =  2
On average, we’ll get the estimated coefficient and variance of the disturbances
‘right’ in repeated sampling. The variance of the estimators of ̂ 0 and ˆ1 are
smaller than those from any other linear unbiased estimator. They have minimum
variance.
Remarks:
1) The Gauss-Markov theorem does not say that OLS estimates are normally
distributed (and it does not depend on Assumption 7).
2) But if Assumption 7 is met, the OLS estimates are normally distributed.
3) If the errors are not normal, OLS is approximately normal provided the
sample size is large enough. This is due to the Central Limit Theorem.
Consistency: OLS estimator is also consistent. That is as n goes to infinity,
ˆ0   0 and ˆ1  1 in probability. Why?
Page 8
V. Discussion Questions: Q4.3, Q4.4, Q4.7
VI. Learning the Notations in Table 4.1.
VII. Computing Exercises: Monte Carlo
Download