Econ 301.02 Econometrics TASKIN Notes on the violation of the Gauss Markov assumptions Heteroskedasticity: One of the important assumption of classical regression model is that the variance of Var(ui ) = s 2 , and the violation of this assumption is the error term is constant, ie. known as the heteroskedasticty problem in the regression anaylsis. When the error term does not have a constant variance the problem can be stated as: Var(ui ) = s i2 What might be the causes of heteroskedasticity: In some cases the dependent variable shows a larger variability at different levels of the explanatory variable. Examples of this phenomenon can be observed in the analysis of consumption behavior or saving behavior. In most cases, there is little variability in the desired consumption amounts at low levels of disposable income but larger variability in the desired consumption at higher disposable income levels. The similar observation may be made for saving behavior. At high income levels the spending on necessities make up a lower percentage of total spending and hence the amount of spending beyond that level can be different for each high income family. A similar observation can be made in the analysis of firm dividends explained by the level of profits. There is low variability in the amounts of dividends distributed by firms with low profits but high variability in the dividends of high profit firms. Heteroscedasticity may be the results of outlier observations. Other forms of misspecification can also produce heterscedasticity. For example is a variable is omitted, the resulting error terms may exhibit a pattern similar to the omitted variable. For example if your are estimating a demand function and you missed the price of other variables that is an either a substitute good or a complementary good, the errors in the misspecified model will behave as the omitted price variable. As long as the variable omitted is not linearly correlated with the explanatory variables included into the regression, the estimates will be unbiased. Another form of misspecification which looks like a heteroscedastic errors is the choice of wrong functional form. If the relationship studied is a quadratic relationship but if the squared variables are not included in to the estimation, then model will exhibit heteroscedastic errors. The systematic measurement errors in one direction in the variables also may lead to heteroscedastic error results. Consequences of heteroscedasticity: The unbiasedness property of the OLS estimators does not change. OLS esimates are still unbiased but they are not efficient. Therefore B.L.U.E. does not hold. OLS estimators may not have the minimum variance among all the linear unbiased estimators. Eg. More distant observations from the true line will have a larger weight in the determination of the slope coefficients, and hence tend to have unbalanced influence on the result.(this is clearly the mistake that the weighted least square estimation tagests to correct). Standard errors or variances are biased and have a different formula than the one that is used in the OLS with well behaved error terms. Eg. In the simple model: yi = b0 + b1 xi + ui , the variance of the slope estimator is Var(b̂1 ) = s2 å(xi - x )2 under the conditions of the standard assumptions. However, when the errors are htereoscedastic, then the variance becomes å(xi - x )2 s i2 Var(b̂1 ) = (å(xi - x)2 )2 . Hence the reported OLS results without ay correction is the first one, when in fact the correct one is the second formula. The following t stats are also incorrect and and any test that follows the reported that uses these variance estimates such as the conventionally computed t or F statistics or confidence interval estimates are also wrong. s 2 is also wrong, hence E(ŝ ) = s 2 does not hold. The standard estimate of Detection of the presence of the heteroscedasticity problem: Does the problem of heretoskedasticity exists in a data that you are working with? Visual tests: Goldfeld and Quant test: White’s heteroskedasticity test: Breush-Pagan test: Correction of the heteroscedasticity problem: I. Generalized Least Square estimation: (Weighted Least Square estimation) Here the objective is to minimize the weight of the large variance observation and maximize the variance of the small variance observation. If you have information on s i2 , then the corrected estimation will be where each observation is weighted by the inverse of this true error variance, i.e . The corrected Weighted Least Square estimation will be: yi 1 x u = b0 + b1 i + i si si si si s i2 s i2 iis impossible to obtain. Hence, the above s2 GLS method can be used if there is a proxy for i .\There may be a Var(ui ) = s i2 ,and yi , or xi j or relationship between the error term . ui , hence any other variable zi which may not be in the model. The starting point of However, the information on finding this relationship will be at the stages of detection of the heteroscedasticity with either visual methods or methods that examine the residual of the initial OLS estimation and the possible set of variables. e.g. (1) If you think that the form of heresockedasticity is as follows: Var(ui ) = s 2 .xi , which essentially says that there is a proportional relationship s . between the variance and the variable xi by a constant factor of 2 With this case the weighted relationship will be: yi 1 x u = b0 + b1 i + i xi xi xi xi e.g. (2) If your judgment indicates that the form of heresockedasticity is as follows: Var(ui ) = s 2 .xi2 , which essentially says that there is a proportional relationship between the variance and the square of the variable xi by a constant factor of s2. With this case the weighted relationship will be: yi 1 x u = b0 + b1 i + i xi xi xi xi s i2 ,, then the results of the If you had been able to know the true value of the weighted least square would have been B.L.U.E. Since this is not possible, then the estimation with the proxy for i ,the estimators are consistent. Another examples of a generalized least square estimation information is s2 s2 when you have information that there are two different values of i . e.g. (3) You know that or you were able to observe that for your sample that covers the period 1971- 2010, the first sub period of 1971-1990 have low variance but the subperiod of 1991-2010 has a larger variance for the error term. It is possible to do weighted least squares here. If you think that first sub perioEstima d the variance of the error term is s 1 and it is lower that the variance of the error term in the second sub period, 2 s 22 . The correction that you will do should follow the following steps: i. ii. iii. Estimate the equation with OLS and obtain the residuals. ˆ2 ˆ2 Compute s 1 and s 1 , by using the residual values separately for each subperiod. Estimate the GSL (WLS) with the following equation: y1971 1 x1971 u1971 ˆ ˆ ˆ ˆ1 1 1 1 y x 1 1972 1972 u1972 ˆ1 ˆ1 ˆ1 ˆ1 . . . . 0 y 1 x u 1991 1991 1 1991 ˆ ˆ ˆ 2 2 2 ˆ 2 . . . . y 2010 1 x1991 u 2010 ˆ 2 ˆ 2 ˆ 2 ˆ 2 Autocorrelation (Serial Correlation) When error terms from different time periods are correlated. This problem occurs usually in time series data. Most of the time the errors in adjacent time periods are correlated. This violates the assumption of unrelated disturbance terms belonging to different periods, i.e. Cov(uiu j ) = 0 . E(uiu j ) = 0 . This becomes Cov(uiu j ) ¹ 0 . Since we are using a time series data, it is common to use (t subscript rather than i), and the presence of autocorrelation can be despicted as Cov(ut ut+s ) ¹ 0 where s ¹ 0 . Since there are many different forms of correlation is possible between the error terms with the above too general form, it is customary to specify a mechanism that generates the error terms to create an autocorrelated error terms. One such mechanism is the following: ut = rut-1 + et -1< r <1 The r The is known as the coefficient of autocorrelation. The et is the white noise error term with all the desired properties. This process that creates the ut is known as the first –order autoregressive scheme (AR1) . 1 The properties of such error terms will be : Var(ut ) = 2 s e2 s se , Cov(u u ) = r , Corr(ut ut-s ) = r s t t-s 2 2 1- r 1- r What might be the causes of autocorrelation: If there is an inertia in the economic series, especially in macroeconomic series. The sluggish adjustment in the economic series creates this pure autocorrelated error terms. If there is an excluded explanatory variable in the model, the effect of this variable will be observed as a systematic factor in the error term. Usually this type of autocorrelation problem is corrected by including the omitted variable. There can be higher order processes if the error term today is related to just not the the error term last period but the period before that. An equation such as will 1 explain such a phenomena ut = r1ut-1 + r2ut-2 + et ... Incorrect functional form also gives the autocorrelated error terms. The correction of the functional form in the estimation also corrects the problem. Lagged dependent variable is an explanatory factor, especially in slow adjusting variables. If this lagged dependent is omitted from the model, then the error will contain the effect of this lagged dependent variable and the errors will be correlated. The nontsationarity of the series also will create a correlated error term.s Data transformations may also create correlation in the error terms, if the original errors are uncorrelated. Consequences of autocorrelation The unbiasedness property of the OLS esimators does not change. OLS esimates are still unbiased but they are not efficient. Therefore B.L.U.E. does not hold. OLS estimators may not have the minimum variance among all the linear unbiased estimators. The estimators are linear unbiased, as well as consistent and asymptotically normally distributed. But they are not efficient. Var(b̂1 ) The formula for is not the standard formula which is Var(b̂1 ) = s2 å(xi - x )2 Hence if we continue to use the standard variance reported for statistical test such as t or F we, they will be incorrect. Var(b̂ ) 1 AR the correctly calculated formula for the variance is Even if the used the confidence intervals will still be wider than the confidence intervals which may be commputed with an alternative estimator (such as GLS). sˆ 2 with the standard formula is likely to The estimated error variance underestimate the true value of We are likely to overestimate R2. s2. s 2 is correctly estimated , still Var(b̂1 ) will underestimate Var(b̂1 )AR Even if Therefore the usual t and F tests are no longer valid and is likely to give misleading results. (The relationship between sˆ 2 and the true s 2 ) Detection of the presence of the autocorrelation problem: Does the problem exits or not? Visual test: Run the OLS estimate of the initial model yt = b0 + b1 xt + ut , obtain the residuals ût , plot the these residuals to see if the same sign tends to follow each other. (+ residuals following the previous + residuals and – residuals following the – coefficients) Durbin Watson test o Conditions that is necessary for the use of Durbin-Watson. 1. DW can only test for the first order autocorrelation, ie. AR(1), 2. ut should have normal distribution, 3. xt should be nonstochastic (fixed in repeated samples), 4. model should have an intercept term, 5. the model should not include the lagged dependent variable as an explanatory variable since this creates the problem of endogeneity if there is autocorrelated error terms. H o : 0, The null hypothesis is H : 0, and if DW is less than dlower critical A value, then we can reject the above null hypothesis. t test for the AR(1) term in the following equation: by using the residuals of the OLS estimates of the original model. Breusch-Godfrey test: Allows for higher order autocorrelation and regressors, such as lagged dependent variable. y 0 1 xt u t Step 1: Run the OLS estimate of the initial model t , obtain residuals ût , Step 2: Run the following regression: uˆ t o 1 xt 1uˆ t 1 2 uˆ t 2 3 uˆ t 3 ... p uˆ t p et ; p And then test the joint significance of the coefficients of autocorrelation the 2 terms, with the following c distribution: 2 2 (n p) Ruˆ ~ p if the null hypothesis of all p autocorrelation terms. coefficients of Correction of the autocorrelation problem: I. Generalized Least Square estimation: (Weighted Least Square estimation) When there is pure autocorrelation problem, it is possible to transform the model to get rid of dependency of the error terms. This is one version of Generalized Least Square estimation method. In a simple regression model -1< r <1 ; y t 0 1 xt u t with the error term ut = rut-1 + et and The same equation will also hold for the time period t-1 and multiplied by r : yt 1 0 1 xt 1 ut 1 , Then the last equation is subtracted from the first one which gives the following equation: yt yt 1 0 (1 ) 1 ( xt xt 1 ) (u t u t 1 ) This equation has uncorrelated error terms which satisfies the Cov(et et-1 ) = 0 condition. Hence the above transformations performs the necessary correction for the problem of first order autocorrelation problem. The new equation is with transformed variables and the application of the OLS estimation to this new variables is known as Generalized Least Square estimation. The transformed equation is: * y t* 0 x ot* 1 xt* et y * y t y t 1 xt* xt xt 1 x0t =1- r where t , , In order to perform this estimation, we need a value for r . However the true value of r is unknown. It is possible to only come up with an estimate of r . There are several methods we can use to estimate r . (1) method is to use DW ˆ statistics and to use the DW 2(1 ) ; (2) method is to estimate the ˆ ˆ following equation u t u t 1 et by using the residual from the initial OLS estimate and use the estimated r̂ . The GLS estimation conducted with the estimated r̂ values provide ˆ GLS 0 and ˆ GLS 1 2. This estimation can only be done consistent estimates of with (n-1) number of observations, since one observation is lost in the transformation3. To avoid any inefficiency due to a loss of observation, especially in cases when the n is not large, another transformation can be conducted for the first observation. This transformation is as follows: * y1 1 ˆ 2 x 1 ˆ 2 y is used for the first observation of 1 and 1 is * x used for the first observation of 1 . Endogeneity Problem: Another important assumption about the error term is that it is uncorrelated with the explanatory variables. Hence, when we assume that the explanatory variables Cov(u i , xi ) 0 are fixed in repeated sample, ( xij 's are not stochastic) then the . With 2 r If the true value of had been known, then GSL estimates would have been unbiased 3 To avoid any inefficiency due to a loss of observation, especially in cases when the n is not large, another transformation can be conducted for the first observation. This transformation is as follows: y1 1 ˆ 2 * is used for the first observation of y1 and x1 1 ˆ 2 for the first observation of x1 * . this assumption we can state the Gauss E(ui ) 0, Var(ui ) 2 , Cov(ui , u j ) 0 . Markov assumptions as However, is the assumption of fixed explanatory variables is violated and the explanatory variables are also random variables drawn from a distribution then the same Gauss Markov assumptions has to be stated as: E(ui | xi ) 0, Var(ui | xi ) 2 , Cov(ui , u j | xi x j ) 0 This is to say that no matter what explanatory variable values are observed the assumptions given the value of x’s still holds. Then the unbiasedess can be stated as E(ˆ1 | xi ) 1 However, there can be cases where the explanatory variable is contemporaneously correlated with the error terms. The cases that lead to this correlations are as follows: 1) Errors in measurement of the explanatory variables, 2) Omitted variables (correlated with the included variable), 3) Jointly determined variables (simultaneity), 4) Lagged dependent variables with autocorrelated error terms, These create the endogeneity of the explanatory variables. Under these conditions the OLS estimator will be biased and inconsistent, which is known as the endogeneity problem. The proof of this inconsistency: OLS estimator of the slope coefficient in the simple linear regression model is given as : ( xi x )u i ( xi x )( y i y ) ˆ1 1 2 ( x i x ) 2 ( x x ) i p lim(( 1 / n) ( xi x )ui ) cov( xi ui ) p lim ˆ1 p lim 1 1 2 Var (ui ) p lim(( 1 / n) ( xi x ) ) Since cov( xi u i ) 0 , the p lim ˆ1 1 and hence the estimator is inconsistent. The solution is to use Instrumental variable estimation technique, ie. IV zi estimator. You have to find another variable, which can be used as an ui xi instrument that is uncorrelated with and closely correlated with . Then the formula for the instrumental estimator is: ˆ IV 1 ( z i z )( y i y ) ( z i z )( xi x ) This estimator is going to be a consistent estimator.