3.4 The Components of the OLS Variances: Multicollinearity We see in (3.51) that the variance of Bjhat depends on three factors: σ2, SSTj and Rj2: 1) The error variance, σ2 Larger error variance = Larger OLS variance -more “noise” in the equation makes it more difficult to accurately estimate partial effects of the variables -one can reduce the error variance by adding (valid) variables to the equation 3.4 The Components of the OLS Variances: Multicollinearity 2) The Total Sample Variation in xj, SSTj Larger xj variance – Smaller OLSj variance -increasing sample size keeps increasing SSTj since SST j ( xij x j ) 2 -This still assumes that we have a random sample 3.4 The Components of the OLS Variances: Multicollinearity 3) Linear relationships among x variables: Rj2 Larger correlation in x’s – Bigger OLSj variance -Rj2 is the most difficult component to understand - Rj2 differs from the typical R2 in that it measures the goodness of fit of: xˆij 0 1 xi1 2 xi 2 ... k xik -Where xj itself is not considered an explanatory variable 3.4 The Components of the OLS Variances: Multicollinearity 3) Linear relationships among x variables: Rj2 -In general, Rj2 is the total variation in xj that is explained by the other independent variables -If Rj2=1, MLR.3 (and OLS) fails due to perfect multicollinearity (xj is a perfect linear combination of the other x’s) 2 Note that: ˆ Var ( j ) as R j -High (but not perfect) correlation between independent variables is MULTICOLLINEARITY 3.4 Multicollinearity -Note that an Rj2 close to 1 DOES NOT violate MLR. 3 -unfortunately, the “problem” of multicollinearity is hard to define -No Rj2 is accepted as being too high -A high Rj2 can always be offset by a high SSTj or a low σ2 -Ultimately, how big is Bjhat relative to its standard error? 3.4 Multicollinearity -Ceteris Paribus, it is best to have little correlation between xj and all other independent variables -Dropping independent variables will reduce multicollinearity -But if these variables are valid, we have created bias -Multicollinearity can always be fought by collecting more data -Sometimes multicollinearity is due to over specifying independent variables: 3.4 Multicollinearity Example -In a study of heart disease, our economic model is: Heart disease=f(fast food, junk food, other) -Unfortunately, Rfast food2 is high, showing a high correlation between fast food and other x variables (especially junk food) -since fast food and junk food are so correlated, they should be examined together; their separate effects are difficult to calculate -Breaking up variables that can be added together can often cause Multicollinearity 3.4 Multicollineairity -it is important to note that multicollinearity may not affect ALL OLS estimates -take the following equation: y 0 1 x1 2 x2 3 x3 u -if x2 and x3 are correlated, Var(B2hat) and Var(B3hat) will be large (due to multicollinearity) -HOWEVER, from (3.51), if x1 is fully uncorrelated with x2 and x3, R12=0 and Var ( ˆ1 ) 2 SST1 3.4 Including Variables -Whether or not to include an independent variable is a balance between bias and variance: -take the following equation: yˆ ˆ0 ˆ1 x1 ˆ2 x2 (A) -where both variables, x1 and x2, are included -Compare to the following equation with x2 omitted: ~ ~ ~ y 0 1 x1 (B) If the true B2≠0 and x1 and x2 have ANY correlation, B1tilde is biased -Focusing on bias, B hat is preferred 3.4 Including Variables -Considering variance complicates things -From (3.51), we know that: Var( ˆ1 ) 2 (A' ) SST1 (1 R ) 2 1 -Modifying a proof from chapter 2, we know that: ~ Var( 1 ) 2 SST1 (B' ) -It is evident that unless x1 and x2 are uncorrelated in the sample, Var(B1tilde) is always smaller than Var(B1hat). 3.4 Including Variables -Obviously, if x1 and x2 aren’t correlated, we have no bias and no multicollinearity -If x1 and x2 are correlated: 1) If B2≠0, B1tilde is biased, B1hat is unbiased Var(B1tilde)< Var(B1hat) 2) If B2≠0, B1tilde is unbiased, B1hat is unbiased Var(B1tilde)< Var(B1hat) -Obviously in the second situation omit x2. If it has no real impact on y, adding it only causes multicollinearity and reduces OLS’s efficiency -Never include irrelevant variables 3.4 Including Variables -In the first case (B2≠0), leaving x2 out of the model results in a biased estimator of B1 -If the bias is small compared to the variance advantages, traditional econometricians have omitted x2 -However, 2 points argue for including x2: 1) Bias doesn’t shrink with n, but variance does 2) Error variance increases with omitted variables 3.4 Including Variables 1) Sample size, bias and variance -from discussion on (3.45), roughly bias doesn’t increase with sample size -from (3.51), increasing sample size increases SSTj and therefore decreases variance: SST j ( xij x j ) 2 -One can avoid bias and fight multicollinearity by increasing sample size 3.4 Including Variables 2) Error variance and omitted variables -When x2 is omitted and B2≠0, (3.55) underestimates error -Without including x2 in the model, x2’s variance is added to the error’s variance -higher error variance increases Bjhat’s variance 3.4 Estimating σ2 -In order to obtain unbiased estimators of Var(Bjhat), we must first find an unbiased estimator of σ2. -Since we know that σ2=E(u2), an unbiased estimator of σ2 would be: 1 2 ˆ ui n 2 -Unfortunately, this is not a true estimator as we do not observe the errors ui. 3.4 Estimating σ2 -We know that errors and residuals can be written as: ui yi 0 1 xi1 2 xi 2 ... k xik uˆ y ˆ ˆ x ˆ x ... ˆ x i i 0 1 i1 2 i2 k ik Therefore a natural estimate of σ2 would replace u with uhat -However, as seen in the bivariate case, this leads to bias, and we had to divide by n-2 to become a consistent estimator 3.4 Estimating σ2 -To make our estimate of σ2 consistent, we divide by the degrees of freedom n-k-1: uˆ 2 SSR ˆ (3.56) (n k 1) (n k 1) 2 i Where k is the number of independent variables -Notice in the bivariate case k=1 and the denominator is n-2. Also note: df n (k 1) (number of observatio ns) - (number of estimated parameters ) 3.4 Estimating σ2 -Technically, n-k-1 comes from the fact that E(SSR=(n-k-1)σ2 -Intuitively, from OLS’s first order conditions: uˆ i 0 and x uˆ ij i 0 There are therefore k+1 restrictions on OLS residuals (j=1,2,…k) -If we therefore have n-(k+1) residuals we can use these restrictions to find the remaining residuals Theorem 3.3 (Unbiased Estimation of σ2) Under the Gauss-Markov Assumptions MLR. 1 through MLR. 5, E (ˆ ) 2 2 Note: This proof requires matrix algebra and is found in Appendix E Theorem 3.3 Notes -the positive square root of σhat2, σhat, is called the STANDARD ERROR OF THE REGRESSION (SER), or the STANDARD ERROR OF THE ESTIMATE -SER is an estimator of the standard deviation of the error term -when another independent variable is added to the equation, both SSR and the degrees of freedom fall -Therefore an additional variable may increase or decrease SER Theorem 3.3 Notes In order to construct confidence intervals and perform hypothesis tests, we need the STANDARD DEVIATION OF BJHAT: sd ( ˆ j ) SST j (1 R ) 2 j Since σ is unknown, we replace it with its estimator, σhat, to give us the STANDARD ERROR OF se( ˆ j ) ˆ SST j (1 R ) 2 j (3.58) BJHAT: 3.4 Standard Error Notes -since the standard error depends on σhat, it has a sampling distribution -Furthermore, standard error comes from the variance formula, which relies on homoskedasticity (MLR.5) -While heteroskedasticity doesn’t cause bias in Bjhat, it does affect its variance and therefore cause bias in its standard errors -Chapter 8 covers how to correct for heteroskedasticity 3.5 Efficiency of OLS - BLUE -MLR. 1 through MLR. 4 show that OLS is unbiased, but many unbiased estimators exist -HOWEVER, using MLR.1 through MLR.5, OLS’s estimate Bjhat of Bj is BLUE: Best Linear Unbiased Estimator 3.5 Efficiency of OLS - BLUE Estimator -OLS is an estimator as “it is a rule that can be applied to any sample of data to produce an estimate” Unbiased -Since OLS’s estimate has the property E ( ˆ j ) j j0,1,...k OLS is unbiased 3.5 Efficiency of OLS - BLUE Linear -OLS’s estimates are linear since Bjhat can be expressed as a linear function of the data on the dependent variable ˆ j wij yi (3.59) Where wij is a function of independent variables -This is evident from equation (3.22) 3.5 Efficiency of OLS - BLUE Best -OLS is best since it has the smallest variance of all linear unbiased estimators The Gauss-Markov theorem states that, given assumptions MLR. 1 through MLR.5, for any other estimator Bjtilde that is linear and unbiased: ~ ˆ Var ( j ) Var ( j ) And this equality is generally strict Theorem 3.4 (Gauss-Markov Theorem) Under the Assumptions MLR. 1 through MLR. 5, ˆ ˆ ˆ 0 , 1 ,... k Are respectively the best linear unbiased estimators (BLUE’s) of 0 , 1 ,... k Theorem 3.4 Notes -if our assumptions hold, no linear unbiased estimator will be a better choice than OLS -if we find any other unbiased linear estimator, its variance will be at least as big as OLS’s -If MLR.4 fails, OLS is biased and Theorem 3.4 fails -If MLR.5 (homoskedasticity) fails, OLS is not biased but no longer has the smallest variance, it is LUE