Econ 399 Chapter3e

3.4 The Components of the OLS Variances: Multicollinearity We see in (3.51) that the variance of Bjhat depends on three factors: σ2, SSTj and Rj2: 1) The error variance, σ2 Larger error variance = Larger OLS variance -more “noise” in the equation makes it more difficult to accurately estimate partial effects of the variables -one can reduce the error variance by adding (valid) variables to the equation 3.4 The Components of the OLS Variances: Multicollinearity 2) The Total Sample Variation in xj, SSTj Larger xj variance – Smaller OLSj variance -increasing sample size keeps increasing SSTj since SST j   ( xij  x j ) 2 -This still assumes that we have a random sample 3.4 The Components of the OLS Variances: Multicollinearity 3) Linear relationships among x variables: Rj2 Larger correlation in x’s – Bigger OLSj variance -Rj2 is the most difficult component to understand - Rj2 differs from the typical R2 in that it measures the goodness of fit of: xˆij  0  1 xi1  2 xi 2  ...  k xik -Where xj itself is not considered an explanatory variable 3.4 The Components of the OLS Variances: Multicollinearity 3) Linear relationships among x variables: Rj2 -In general, Rj2 is the total variation in xj that is explained by the other independent variables -If Rj2=1, MLR.3 (and OLS) fails due to perfect multicollinearity (xj is a perfect linear combination of the other x’s) 2 Note that: ˆ Var (  j )   as R j   -High (but not perfect) correlation between independent variables is MULTICOLLINEARITY 3.4 Multicollinearity -Note that an Rj2 close to 1 DOES NOT violate MLR. 3 -unfortunately, the “problem” of multicollinearity is hard to define -No Rj2 is accepted as being too high -A high Rj2 can always be offset by a high SSTj or a low σ2 -Ultimately, how big is Bjhat relative to its standard error? 3.4 Multicollinearity -Ceteris Paribus, it is best to have little correlation between xj and all other independent variables -Dropping independent variables will reduce multicollinearity -But if these variables are valid, we have created bias -Multicollinearity can always be fought by collecting more data -Sometimes multicollinearity is due to over specifying independent variables: 3.4 Multicollinearity Example -In a study of heart disease, our economic model is: Heart disease=f(fast food, junk food, other) -Unfortunately, Rfast food2 is high, showing a high correlation between fast food and other x variables (especially junk food) -since fast food and junk food are so correlated, they should be examined together; their separate effects are difficult to calculate -Breaking up variables that can be added together can often cause Multicollinearity 3.4 Multicollineairity -it is important to note that multicollinearity may not affect ALL OLS estimates -take the following equation: y   0  1 x1   2 x2  3 x3  u -if x2 and x3 are correlated, Var(B2hat) and Var(B3hat) will be large (due to multicollinearity) -HOWEVER, from (3.51), if x1 is fully uncorrelated with x2 and x3, R12=0 and Var ( ˆ1 )   2 SST1 3.4 Including Variables -Whether or not to include an independent variable is a balance between bias and variance: -take the following equation: yˆ  ˆ0  ˆ1 x1  ˆ2 x2 (A) -where both variables, x1 and x2, are included -Compare to the following equation with x2 omitted: ~ ~ ~ y  0  1 x1 (B) If the true B2≠0 and x1 and x2 have ANY correlation, B1tilde is biased -Focusing on bias, B hat is preferred 3.4 Including Variables -Considering variance complicates things -From (3.51), we know that: Var( ˆ1 )  2 (A' ) SST1 (1  R ) 2 1 -Modifying a proof from chapter 2, we know that: ~ Var( 1 )  2 SST1 (B' ) -It is evident that unless x1 and x2 are uncorrelated in the sample, Var(B1tilde) is always smaller than Var(B1hat). 3.4 Including Variables -Obviously, if x1 and x2 aren’t correlated, we have no bias and no multicollinearity -If x1 and x2 are correlated: 1) If B2≠0, B1tilde is biased, B1hat is unbiased Var(B1tilde)< Var(B1hat) 2) If B2≠0, B1tilde is unbiased, B1hat is unbiased Var(B1tilde)< Var(B1hat) -Obviously in the second situation omit x2. If it has no real impact on y, adding it only causes multicollinearity and reduces OLS’s efficiency -Never include irrelevant variables 3.4 Including Variables -In the first case (B2≠0), leaving x2 out of the model results in a biased estimator of B1 -If the bias is small compared to the variance advantages, traditional econometricians have omitted x2 -However, 2 points argue for including x2: 1) Bias doesn’t shrink with n, but variance does 2) Error variance increases with omitted variables 3.4 Including Variables 1) Sample size, bias and variance -from discussion on (3.45), roughly bias doesn’t increase with sample size -from (3.51), increasing sample size increases SSTj and therefore decreases variance: SST j   ( xij  x j ) 2 -One can avoid bias and fight multicollinearity by increasing sample size 3.4 Including Variables 2) Error variance and omitted variables -When x2 is omitted and B2≠0, (3.55) underestimates error -Without including x2 in the model, x2’s variance is added to the error’s variance -higher error variance increases Bjhat’s variance 3.4 Estimating σ2 -In order to obtain unbiased estimators of Var(Bjhat), we must first find an unbiased estimator of σ2. -Since we know that σ2=E(u2), an unbiased estimator of σ2 would be: 1 2 ˆ   ui n 2 -Unfortunately, this is not a true estimator as we do not observe the errors ui. 3.4 Estimating σ2 -We know that errors and residuals can be written as: ui  yi   0  1 xi1   2 xi 2  ...   k xik uˆ  y  ˆ  ˆ x  ˆ x  ...  ˆ x i i 0 1 i1 2 i2 k ik Therefore a natural estimate of σ2 would replace u with uhat -However, as seen in the bivariate case, this leads to bias, and we had to divide by n-2 to become a consistent estimator 3.4 Estimating σ2 -To make our estimate of σ2 consistent, we divide by the degrees of freedom n-k-1:  uˆ 2 SSR ˆ   (3.56) (n  k  1) (n  k  1) 2 i Where k is the number of independent variables -Notice in the bivariate case k=1 and the denominator is n-2. Also note: df  n  (k  1)  (number of observatio ns) - (number of estimated parameters ) 3.4 Estimating σ2 -Technically, n-k-1 comes from the fact that E(SSR=(n-k-1)σ2 -Intuitively, from OLS’s first order conditions: uˆ i  0 and  x uˆ ij i 0 There are therefore k+1 restrictions on OLS residuals (j=1,2,…k) -If we therefore have n-(k+1) residuals we can use these restrictions to find the remaining residuals Theorem 3.3 (Unbiased Estimation of σ2) Under the Gauss-Markov Assumptions MLR. 1 through MLR. 5, E (ˆ )   2 2 Note: This proof requires matrix algebra and is found in Appendix E Theorem 3.3 Notes -the positive square root of σhat2, σhat, is called the STANDARD ERROR OF THE REGRESSION (SER), or the STANDARD ERROR OF THE ESTIMATE -SER is an estimator of the standard deviation of the error term -when another independent variable is added to the equation, both SSR and the degrees of freedom fall -Therefore an additional variable may increase or decrease SER Theorem 3.3 Notes In order to construct confidence intervals and perform hypothesis tests, we need the STANDARD DEVIATION OF BJHAT: sd ( ˆ j )   SST j (1  R ) 2 j Since σ is unknown, we replace it with its estimator, σhat, to give us the STANDARD ERROR OF se( ˆ j )  ˆ SST j (1  R ) 2 j (3.58) BJHAT: 3.4 Standard Error Notes -since the standard error depends on σhat, it has a sampling distribution -Furthermore, standard error comes from the variance formula, which relies on homoskedasticity (MLR.5) -While heteroskedasticity doesn’t cause bias in Bjhat, it does affect its variance and therefore cause bias in its standard errors -Chapter 8 covers how to correct for heteroskedasticity 3.5 Efficiency of OLS - BLUE -MLR. 1 through MLR. 4 show that OLS is unbiased, but many unbiased estimators exist -HOWEVER, using MLR.1 through MLR.5, OLS’s estimate Bjhat of Bj is BLUE: Best Linear Unbiased Estimator 3.5 Efficiency of OLS - BLUE Estimator -OLS is an estimator as “it is a rule that can be applied to any sample of data to produce an estimate” Unbiased -Since OLS’s estimate has the property E ( ˆ j )   j  j0,1,...k OLS is unbiased 3.5 Efficiency of OLS - BLUE Linear -OLS’s estimates are linear since Bjhat can be expressed as a linear function of the data on the dependent variable ˆ j   wij yi (3.59) Where wij is a function of independent variables -This is evident from equation (3.22) 3.5 Efficiency of OLS - BLUE Best -OLS is best since it has the smallest variance of all linear unbiased estimators The Gauss-Markov theorem states that, given assumptions MLR. 1 through MLR.5, for any other estimator Bjtilde that is linear and unbiased: ~ ˆ Var (  j )  Var (  j ) And this equality is generally strict Theorem 3.4 (Gauss-Markov Theorem) Under the Assumptions MLR. 1 through MLR. 5, ˆ ˆ ˆ 0 , 1 ,... k Are respectively the best linear unbiased estimators (BLUE’s) of  0 , 1 ,... k Theorem 3.4 Notes -if our assumptions hold, no linear unbiased estimator will be a better choice than OLS -if we find any other unbiased linear estimator, its variance will be at least as big as OLS’s -If MLR.4 fails, OLS is biased and Theorem 3.4 fails -If MLR.5 (homoskedasticity) fails, OLS is not biased but no longer has the smallest variance, it is LUE

Econ 399 Chapter3e

Related documents

Products

Support

Econ 399 Chapter3e

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib