Bedru B. and Seid H ECONOMETRICS A TEACHING MATERIAL FOR DISTANCE STUDENTS MAJORING IN ECONOMICS Module II Prepared By: Bedru Babulo Seid Hassen Department of Economics Faculty of Business and Economics Mekelle University August, 2005 Mekelle Econometrics: Module-II Bedru B. and Seid H Econometrics Module II II Introduction to the module The principal objective of the course, “Introduction to Econometrics”, is to provide an elementary but comprehensive introduction to the art and science of econometrics. It enables students to see how economic theory, statistical and mathematical methods are combined in the analysis of economic data, with a purpose of giving empirical content to economic theories and verify or refute them. Module II of the course is a continuation of module-I. In the first module of the course the first three chapters - introductory chapter, the simple classical regression models, and the multiple regression models - are presented with a fairly detailed treatment. In the two of chapters of Module-I i.e. on the chapters on ‘Classical Linear Regression Models’ , students are introduced with the basic logic, concepts, assumptions, estimation methods, and interpretations of the classical linear regression models and their applications in economic science. The ordinary least square (OLS) estimation method discussed in module-I possess the desirable properties of estimators provided that the basic classical assumptions are satisfied. But in many real world instances, the classical assumptions of linear regression models may be violated. Therefore, module-II pays due attention to violations of these assumptions, their consequences, and the remedial measures. Specifically, Autocorrelation, Heteroscedasticity, and Multicolliearity problems will be given much focus. Besides the discussions on ‘violations of classical assumptions’, three more chapters viz. Regression on Dummy Variables; Dynamic Econometric Models; and an Introduction to Simultaneous Equation Models are also included in Module-II. Chapters of Module-II in Brief Chapter Four: Violations of the assumptions of Classical Linear Regression models 4.1 Heteroscedasticity 4.2 Autocorrelation 4.3 Multicollinearity Chapter Five: Regression on Dummy Variables Chapter Six: Dynamic econometric models Chapter Seven: An Introduction to Simultaneous Equation models Econometrics: Module-II Bedru B. and Seid H Enjoy the Reading!!! Chapter Four Violations of basic Classical Assumptions 4.0 Introduction In both the simple and multiple regression models, we made important assumptions about the distribution of Yt and the random error term ‘ut’. We assumed that ‘ut’ was random variable with mean zero and var(u t ) = σ 2 , and that the errors corresponding to different observation are uncorrelated, cov(u t , u s ) = 0 (for t ≠ s) and in multiple regression we assumed there is no perfect correlation between the independent variables. Now, we address the following ‘what if’ questions in this chapter. What if the error variance is not constant over all observations? What if the different errors are correlated? What if the explanatory variables are correlated? We need to ask whether and when such violations of the basic clssical assumptions are likely to occur. What types of data are likely to lead to heterosedasticity (different error variance)? What type of data is likely to lead to autocorrelation (correlated errors)? What types of data are likely to lead to multicollinearity? What are the consequences such violations on least square estimators? How do we detect the presence of autocorrelation, heteroscedasticity, or multicollineairy? What are the remedial measures? How do we build an alternative model and an alternative set of assumptions when these violations exist? Do we need to develop new estimation procedures to tackle the problems? In the subsequent sections (4.1, 4.2, and 4.3), we attempt to answer such questions. 4.1 Heteroscedasticity 4.1.1 The nature of Heteroscedasticty Econometrics: Module-II Bedru B. and Seid H In the classical linear regression model, one of the basic assumptions is that the probability distribution of the disturbance term remains same over all observations of X; i.e. the variance of each u i is the same for all the values of the explanatory variable. Symbolically, var(u i ) = Ε[u i − Ε(u i )] = Ε(u i2 ) = σ u2 ; constant value. 2 This feature of homogeneity of variance (or constant variance) is known as homoscedasticity. It may be the case, however, that all of the disturbance terms do not have the same variance. This condition of non-constant variance or nonhomogeneity of variance is known as heteroscedasticity. Thus, we say that U’s are heteroscedastic when: var(u i ) ≠ σ u2 (a constant) but var(u i ) = σ ui2 (a value that varies) 4.1.2 Graphical representation of heteroscedasticity and homoscedasticity The assumption of homoscedasticity states that the variation of each u i around its zero mean does not depend on the value of explanatory variable. The variance of each u i remains the same irrespective of small or large values of X; the explanatory variable. Mathematically, σ u2 is not a function of X; i.e. σ 2 ≠ f ( X i ) Fig. (a) Homosecdastic error variance Econometrics: Module-II Bedru B. and Seid H fig.(b) Heteroscedastic error variance If the variance of U were the same at every point or for all values of X, definite restriction would be place on the scatter of Y against X when plotted in the three dimensions, we should observe something approximating the pattern of fig (a). In contrast, consider fig (b) shows that the conditional variance of Yi (which in fact is u i ) increases as X increases. If σ u2 is not constant but its value depends on the value of X; it means that σ ui2 = f ( X i ) . Such dependency is depicted diagrammatically in the following figures. Three cases of heteroscedasticity all shown by increasing or decreasing dispersion of the observation from the regression line. In panel (a) σ u2 seems to increase with X. in panel (b) the error variance appears greater in X’s middle range, tapering off toward the extremes. Finally, in panel Econometrics: Module-II Bedru B. and Seid H (c), the variance of the error term is greater for low values of X, declining and leveling off rapidly an X increases. The pattern of hetroscedasticity would depend on the signs and values of the coefficients of the relationship σ ui2 = f ( X i ) , but u i ’s are not observable. As such in applied research we make convenient assumptions that hetroscedasticity is of the forms: i. σ ui2 = K 2 ( X i2 ) ii. σ 2 = K 2 (X i ) iii. σ ui2 = K etc. Xi 4.1.3 Matrix Notation of Heteroscedasticity The variance covariance matrix developed in chapter 2 is represented as: Ε(U i2 ) Ε(U 1U 2 ) .......... Ε(U 1U n ) Ε(U 1U 2 ) Ε(U 22 ) .......... Ε(U 2U n ) Ε(UU ' ) = : : : 2 Ε(U 1U n ) Ε(U 2U n ) .......... Ε(U n ) If Ε(U i2 ) ≠ σ u2 (a constant value), then Ε(UU ' ) ≠ σ 2 I n given that Ε(U iU j ) = 0 . If Ε(U i2 ) = σ ui2 (a value that varies) λ1 0 Ε(UU ' ) = : 0 0 λ2 : 0 0 0 ………………………………………..3.10 : .......... λ n .......... .......... Where λi = Ε(U i2 ) . In other words, variance covariance matrix in the present case is a diagonal matrix with unequal elements in the diagonal. 4..1.4 Examples of Heteroscedastic functions I. Consumption Function: Suppose we are to study the consumption expenditure from a given cross-section sample of family budgets: C i = α + β Yi + U i ; where: Econometrics: Module-II Bedru B. and Seid H C i = Consumption expenditure of ith household; Yi = Disposable income of ith household At low levels of income, the average consumption is low, and the variation below this level is less possible; consumption cannot fall too far below because this might mean starvation. On the other hand, it cannot rise too far above because money income does not permit it. Such constraints may not be found at higher income levels. Thus, consumption patterns are more regular at lower income levels than at higher levels. This implies that at high incomes the u' s will be high, while at low incomes the u' s will be small. The assumption of constant variance of u' s is therefore, does not hold when estimating the consumption function from across section of family budgets. ii. Production Function: Suppose we are required to estimate the production function X = f ( K , L) of the sugar industry from a cross-section random sample of firms of the industry. Disturbance terms in the production function would stand for many factors; like entrepreneurship, technological differences, selling and purchasing procedures, differences in organizations, etc. other than inputs, labor (L) and capital (K) considered in the production function. The factors mentioned above, which are not considered explicitly in the production function show considerable variance in large firms than in small ones. This leads to breakdown of our assumption on homogeneity of variance terms. It should be noted that the problem of heteroscedasticity is likely to be more common in cross-sectional data than in time-series data. One deals with members of population at a given point of time, such as individual consumers or their families, firms, industries. These members may be of different size such as small, medium or large firms or low, medium or high income. In time series data on the other hand, the variables tend to be of similar orders of magnitude because one generally collects data for the same entity over a period of time. Econometrics: Module-II Bedru B. and Seid H 4.1.5 Reasons for Hetroscedasticity There are several reasons why the variances of u i may be variable. Some of these are: 1. Error learning model: it states that as people learn their error of behavior become smaller over time. In this case σ i2 is expected to decrease. Example: as the number of hours of typing practice increases, the average number of typing errors and as well as their variance decreases. 2. As data collection technique improves, σ ui2 is likely to decrease. Thus banks that have sophisticated data processing equipment are likely to commit fewer errors in the monthly or quarterly statements of their customers than banks with out such facilities. 3. Heteroscedasticity can also arise as a result of the presence of outliers. An outlier is an observation that is much different (either very small or very large) in relation to the other observation in the sample. 4.1.6 Consequences of Hetroscedasticity for the Least Square estimates What happens when we use ordinary least squares procedure to a model with hetroscedastic disturbance terms? 1.The OLS estimators will have no bias βˆ = Σx u Σxy = β + i 2i 2 Σx Σx ΣxΕ(u i ) Ε( βˆ ) = β + =β Σx 2 Similarly ; αˆ = Y − βˆX = (α + β X + U ) − βˆX Ε(αˆ ) = α + β X + Ε(U ) − Ε( βˆ ) X = α i.e., the least square estimators are unbiased even under the condition of heteroscedasticity. It is because we do not make use of assumption of homoscedasticity here. Econometrics: Module-II Bedru B. and Seid H 2. Variance of OLS coefficients will be incorrect Under homoscedasticity, var(βˆ ) = σ 2 ΣK 2 = σ2 Σx 2 , but under hetroscedastic assumption we shall have: var(βˆ ) = ΣK i2 Ε(Yi 2 ) = ΣK i2σ ui2 ≠ σ 2 ΣK i2 σ ui2 is no more a finite constant figure, but rather it tends to change with an increasing range of value of X and hence cannot be taken out of the summation (notation). 3.OLS estimators shall be inefficient: in other words, the OLS estimators do not have the smallest variance in the class of unbiased estimators and, therefore, they are not efficient both in small and large samples. Under the heteroscedastic assumption, therefore: var(βˆ ) = ΣK 2 Ε(Y 2) i i Under homoscedasticy, var(βˆ ) = Σxi2σ ui2 xi 2 = ∑ 2 Ε(Yi ) = − − − − − − − − − 3.11 (Σxi2 ) 2 Σx σ2 Σx 2 − − − − − − − − − − − − − − − − − − − 3.12 These two variances are different. This implies that, under heteroscedastic assumption although the OLS estimator is unbiased, but it is inefficient. variance is larger than necessary. To see the consequence of using (3.12) instead of (3.11), let us assume that: σ ui2 = K iσ 2 Where K i are same non-stochastic constant weights. This assumption merely states that the hetroscedastic variances are proportional to K i ; σ 2 being facto of proportionality. Substituting this value of σ ui2 in (3.11), we obtain: σ2 σ 2 Σk i xi2 ˆ var(β ) = = 2 (Σxi2 )(Σxi ) 2 Σx Σk i xi2 2 Σxi Σk x 2 = (var(βˆ ) Homo . i 2 i − − − − − 3.13 Σxi Econometrics: Module-II Its Bedru B. and Seid H That is to say if x 2 and k i are positively correlated and if and only if the second term of (3.13) is greater than 1, then var(βˆ ) under heteroscedasticty will be greater than its variance under homoscedasticity. As a result the true standard error of β̂ shall be underestimated. As such the t-value associated with it will be over estimated which might lead to the conclusion that in a specific case at hand β̂ is statistically significant (which in fact may not be true). Moreover, if we proceed with our model under false belief of homoscedasticity of the error variance, our inference and prediction about the population coefficients would be incorrect. 4.1.7 Detecting Heteroscedasticity We have observed that the consequences of heteroscedasticty are serious on OLS estimates. As such, it is desirable to examine whether or not the regression model is in fact homosedastic. Hence there are two methods of testing or detecting heteroscedasticity. These are: i. Informal method ii. Formal method i. Informal method This method is called informal because it does not undertake the formal testing procedures such as t-test, F-test and the like. It is a test based on the nature of the graph. In this method to check whether a given data exhibits hetroscedsticity or not we look on whether there is a systematic relation between residual squared ei2 and the mean value of Y i.e. (Yˆ ) or with X i . In the figure below ei2 are plotted against Yˆ or ( X i ) . In fig (a), we see there is no systematic pattern between the two variables, suggesting that perhaps no hetroscedasticity is present in the data. Figures b to e, however, exhibit definite patterns. . For instance, c suggests a linear relationship where as d and e indicate quadratic relationship between ei2 and Yi . Econometrics: Module-II Bedru B. and Seid H ii. Formal methods There are several formal methods of testing heteroscedasticty which are based on the formal testing procedures mentioned earlier. In what follows, we will see some of the major ways of detecting heterosedasticity. a. Park test Park formalizes the graphical method by suggesting that the variance of random disturbance σ i2 is some function of the explanatory variable X i . The functional form he suggested was: σ i2 = σ 2 X iβ e vi Or ln σ i2 = ln σ 2 + β ln X i + vi − − − − − − 3.14 where v is the stochastic disturbance term. Since σ i2 is generally not known, park suggests using ei2 as proxy and running the following regression. ln ei2 = ln σ 2 + β ln X i + vi ln ei2 = α + β ln X i + vi − − − − − − − 3.15 Since σ 2 is constant. Econometrics: Module-II Bedru B. and Seid H Equation (3.15) indicates on how to test hetroscedasticity by testing on whether there is a significant relation between X i and ei2 . Let H0 : β = 0 Against H 1 : β ≠ 0 If β turns out to be statistically significant, it would suggest that hetroscedasticity is present in the data. If it turns out to be insignificant, we may accept the assumption of homoscedasticity. The park test is thus a two-stage test procedure; in the first stage, we run OLS regression disregarding the hetroscedasticity question. We obtain ei from this regression and then in the second stage we run the regression in equation (3.15) above. Example: Suppose that from a sample of size n=100 we estimate the relation between compensation and productivity. Y = 1992.342 + 0.2329 X i + ei − − − − − − − 3.16 SE = (936.479) (0.0098) t= (2.1275) (2.333) R 2 = 0.4375 The results reveal that the estimated slope coefficient is significant at 5% level of significant on the bases of one tail t-test. The equation shows that as labour productivity increases by, say, a birr, labor compensation on the average increases by about 23 cents. The residual obtained from regression (3.16) were regressed on X i as suggested by equation (3.15) giving the following result. ln ei2 = 35.817 − 2.8099 ln X i + vi − − − − − −(3.17) SE = (38.319) (4.216) t = (0.934) (−0.667) R 2 = 0.0595 The above result revealed that the slope coefficient is statistically insignificant implying there is no statistically significant relationship between the two variables. Following the park test, one may conclude that there is no hetroscedasticity in the error variance. Although empirically appealing, the park test has some problems. Econometrics: Module-II Bedru B. and Seid H Gold Feld and Quandt have argued that the error term vi entering into ln ei2 = α + β ln X i + vi may not satisfy the OLS assumptions and may itself be hetroscedastic. Nonetheless, as a strict explanatory method, one may use the park test. b. Glejser test: The Glejser test is similar in sprit to the park test. After obtaining the residuals ei from the OLS regression. Glejser suggest regressing the absolute value of U i on the X i variable that is thought to be closely associated with σ i2 . In his experiment, Glejser use the following functional forms: 1 + vi Xi ei = α + β X i + vi , ei = α + β ei = α + β X i + vi , ei = α + β X i + vi ei = α + β 1 + vi , Xi ; where vi is error term. ei = α + β X i2 + vi Goldfeld and Quandt point out that error term vi has some problems in that its expected value is non-zero, i.e. it is serially correlated and irrorrically it is heteroscedstic. An additional difficulty with the Glejser method is that models such as: ei = α + β X i + vi and ei = α + βX i2 + vi are non-linear in parameters and therefore cannot be estimated with the usual OLS procedure. Glejester has found that for large samples the first four preceding models give generally satisfactory results in detecting heterosedasticity. As a practical matter, therefore, the Glejester technique may be used for large samples and may be used in small samples strictly as qualitative device to learn something about heterosedasticity. c. Goldfield-Quandt test This popular method is applicable if one assumes that the heteroscedastic variance σ i2 is positively related to one of the explanatory variables in the regression model. For simplicity, consider the usual two variable models: Econometrics: Module-II Bedru B. and Seid H Yi = α + β i X i + U i Suppose σ i2 is positively related to X i as: σ i2 = σ 2 X i2 − − − − − − − 3.18; where σ 2 is constant. If the above equation is appropriate, it would mean σ i2 would be larger, the larger values of X i .If that turns out to be the case, hetroscedasticity is most likely to be present in the model. To test this explicitly, Goldfeld and Quandt suggest the following steps: Step 1: Order or rank the observations according to the values of X i beginning with the lowest X value Step 2: Omit C central observations where C is specified a priori, and divide the (n − c) observations 2 (n − c) Step 3: Fit separate OLS regression to the first observations and the last 2 (n − c) observations, and obtain the respective residual sums of squares RSS, and 2 remaining (n-c) observations into two groups each of RSS2, RSS1 representing RSS from the regression corresponding to the smaller X i values (the small variance group) and RSS2 that from the larger X i values (the large variance group). These RSS each have (n − c) (n − c − 2 K ) − K or df , where: K is the number of parameters to 2 2 be estimated, including the intercept term; and Step 4: compute λ = df is the degree of freedom. RSS 2 / df RSS1 / df If U i are assumed to be normally distributed (which we usually do), and if the assumption of homoscedasticity is valid, then it can be shown that λ follows F distribution with numerator and denominator df each (n-c-2k)/2. λ= RSS 2 /(n − c − 2 K ) / 2 ~ F (n -c) (n -c) −K , −K RSS1 /(n − c − 2 K ) / 2 2 2 Econometrics: Module-II Bedru B. and Seid H If in application the computed λ (= F ) is greater than the critical F at the chosen level of significance, we can reject the hypothesis of homoscedasticity, i.e. we can say that hetroscedasticity is very likely. Example: to illustrate the Goldfeld-Quandt test, we present in table 3.1 data on consumption expenditure in relation to income for a cross-section of 30 families. Suppose we postulate that consumption expenditure is linearly related to income but that heteroscedasticity is present in the data. We further postulate that the nature of heterosedastic is given in equation (3.15) above. The necessary reordering of the data for the application of the test is also presented in table 3.1. Table 3.1 Hypothetical data on consumption expenditure Y($) and income X($). (Data ranked by X values) Y 55 65 70 80 79 84 98 95 90 75 74 110 113 125 108 115 140 120 145 130 152 144 175 180 135 140 178 191 137 189 X 80 100 85 110 120 115 130 140 125 90 105 160 150 165 145 180 225 200 240 185 220 210 245 260 190 205 265 270 230 250 Y 55 70 75 65 74 80 84 79 90 98 95 108 113 110 125 115 130 135 120 140 144 152 140 137 145 175 189 180 178 191 X 80 85 90 100 105 110 115 120 125 130 140 145 150 160 165 180 185 190 200 205 210 220 225 230 240 245 250 260 265 270 Dropping the middle four observations, the OLS regression based on the first 13 and the last 13 observations and their associated sums of squares are shown next (standard errors in parentheses). Regression based on the last 13 observations Econometrics: Module-II Bedru B. and Seid H Yi = 3.4094 + 0.6968 X i + ei (8.7049) (0.0744) R = 0.8887 2 RSS1 = 377.17 df = 11 Regression based on the last 13 observations Yi = −28.0272 + 0.7941X i + ei (30.6421) (0.1319) R = 0.7681 2 RSS 2 = 1536.8 df = 11 From these results we obtain: λ= RSS 2 / df 1536.8 / 11 = RSS1 / df 377.17 / 11 λ = 4.07 The critical F-value for 11 numerator and 11 denominator for df at the 5% level is 2.82. Since the estimated F (= λ ) value exceeds the critical value, we may conclude that there is hetrosedasticity in the error variance. However, if the level of significance is fixed at 1%, we may not reject the assumption of homosedasticity (why?) Note that the ρ value of the observed λ is 0.014. There are also other tests of hetroscedasticity like spearman’s rank correlation test, Breusch-pagan-Goldfe y test and white’s general hetroscedastic test. But at these introductory level the above tests are enough. 4.1.8 Remedial measures for the problems of heteroscedasticity As we have seen, heteroscedasticity does not destroy the unbiasedness and consistency property of the OLS estimators, but they are no longer efficient. This lack of efficiency makes the usual hypothesis testing procedure of dubious value. Therefore, remedial measures concentrate on the variance of the error term. Consider the model Y = α + β X i + U i , var(u i ) = σ i2 , Ε(u i ) = 0 Ε(u i u j ) = 0 Econometrics: Module-II Bedru B. and Seid H If we apply OLS to the above then it will result in inefficient parameters since var(u i ) is not constant. The remedial measure is transforming the above model so that the transformed model satisfies all the assumptions of the classical regression model including homoscedasticity. Applying OLS to the transformed variables is known as the method of Generalized Least Squares (GLS). In short GLS is OLS on the transformed variables that satisfy the standard least squares assumptions. The estimators thus obtained are known as GLS estimators, and it is these estimators that are BLUE. 4.1.8.1 The Method of Generalized (Weight) Least Square Assume that our original model is: Y = α + βX i + U i where u i satisfied all the assumptions except that u i is heteroscedastic. Ε(u i ) 2 = σ i2 = f (k i ) If we apply OLS to the above model, the estimators are no more BLUE. To make them BLUE we have to transform the above model. Let us assume the following types of hetroscedastic structures, under two conditions: hetroschedasticity when the population variance σ i2 is known and when σ i2 is not known. a. Assume σ i2 known: Ε(u i ) 2 = σ i2 Given the model Y = α + β X i + U i The transforming variable of the above model is σ i2 = σ i so that the variance of the transformed error term is constant. Now divide the above model by σ i . Y σi = α βX i U i + + − − − − − − − − − (3.19) σi σi σi The variance of the transformed error term is constant, i.e. u var i σi u = Ε i σi 2 1 1 = 2 Ε(u i ) 2 = 2 σ i2 = 1 Constant σi σi Econometrics: Module-II Bedru B. and Seid H We can know apply OLS to the above model. The transformed parameters are BLUE. Because all the assumptions including homoscedasticity are satisfied to (3.1). ui σi = 1 σi u ∑ σ i i ∑ w uˆ i (Yi − α − βX i ) 2 2 1 1 = ∑ (Yi − α − βX i ) 2 , Let wi = 2 σi σi 2 i = Σwi (Yi − αˆ − βˆX i ) 2 The method of GLS (WLS) minimizes the weighted residual sum of squares ∂ ∑ wi uˆ i2 = −2Σwi (Yi − αˆ − βˆX i ) = 0 ∂αˆ Σwi (Yi − αˆ − βˆX i ) = 0 Σwi Yi − αˆΣwi − βˆΣwi X i = 0 ⇒ Σwi Yi − βˆΣwi X i = αˆΣwi αˆ = Σwi Yi βˆΣwi X i − = Y * − βˆX * Σwi Σwi where Y * is the weighted mean and X * is the weighted mean which are different from the ordinary mean we discussed in 2.1 and 2.2. 2. ∂ ∑ wi uˆ i2 = −2Σwi (Yi − αˆ − βˆX i )( X i ) = 0 ∂βˆ Σwi (Yi − αˆ − βˆX i )( X i ) = 0 2 Σwi (Yi X i − αˆX i − βˆX i ) = 0 2 Σwi Yi X i − αˆΣwi X i − βˆΣwi X i = 0 ⇒ Σwi Yi X i = αˆΣwi X i + βˆΣwi X i 2 substituting αˆ = Y * − βˆX * in the above equation we get 2 Σwi Yi X i = (Y * − βˆX * )Σwi X i + βˆΣwi X i 2 = Y * Σwi X i − βˆX * Σwi X i + βˆΣwi X i Econometrics: Module-II Bedru B. and Seid H Σwi Yi X i − Y * Σwi X i = βˆ (Σwi X i2 − X * Σwi X i ) 2 Σwi Yi X i − Y * X * Σwi = βˆ (Σwi X i2 − X * Σwi ) βˆ = Σwi Yi X i − Y * X * Σwi 2 Σwi X i2 − X * Σwi = Σx * y * Σx * 2 where x* and y* are weighted deviations. These parameters are now BLUE. b. Now assume that σ i2 is not known Lets assume that Ε(u i ) 2 = σ i2 = k i f ( X i ) , the transformed version of he model may be obtained by dividing through out the original model by f (X i ) . Case a. Suppose the heteroscedasticity is of the form Ε(u i ) 2 = σ i2 = k 2 X i2 , the transforming variable is Y = α + βX i + U i where var(u i ) = σ i2 = K i2 X i2 . X 2 = X if The transformed U α βX i U i α Y = + + = +β + i Xi Xi Xi Xi Xi Xi model is: u Ε i Xi 2 1 K2X 2 = 2 Ε(u i2 ) = = K 2 constant 2 Xi Xi which proves that the new random term in the model has a finite constant variance (= K 2 ) . We can, therefore, apply OLS to the transformed version of the model α Xi +β + Ui . Note that in this transformation the position of the coefficients Xi has changed: the parameter of the variable 1 Xi in the transformed model is the constant intercept of the original model, while the constant of term of the transformed model is the parameter of the explanatory variable X in the original model. Therefore, to get back to the original model, we shall have to multiply the estimated regression by K i . Case b. Suppose the heteroscedasticity is of the form : Ε(u i2 ) = σ i2 = k 2 X i Econometrics: Module-II Bedru B. and Seid H The transforming variable is The transformed model is: = u Ε i X i Xi Y Xi α Xi = α Xi + + β Xi + βX i Xi + Ui Xi Ui Xi 2 = 1 Ε(U i ) 2 = 1 k 2 X = k 2 X X ⇒ Constant variance; thus we can apply OLS to the transformed model. There is no intercept term in the transformed model. Therefore, one will have to use the ‘regression through the origin’ model to estimate α and β . In this case, therefore, to get back to the original model, we shall have to multiply the estimated regression by Xi Case c. suppose heteroscedasticity is of the form Ε(u i2 ) = σ i2 = k 2 [Ε(Yi )] 2 The transforming variable is Ε(vi ) 2 = Ε(Yi ) = α + β X i Ui βX i Y α = + + − − − − − − − − − − − − − − − (i ) α + βX i α + βX i α + βX i α + βX i Ui Ε α + βX i 1 1 2 = Ε(u i ) 2 = K 2 [Ε(Yi )] = K 2 2 2 [Ε(Yi )] (α + β X i ) The transformed model described in (i) above is however not operational in this case. It is because values of α and β are not known. But since we can obtain Yˆ = αˆ + βˆX i , the transformation can be made through the following two steps. 1st : we run the usual OLS regression disregarding the heteroscedasticity problem in the data and obtain Yˆ using the estimated Yˆ , we transform the model as follows. X U Y 1 =α +β i + i Yˆ Yˆ Yˆ Yˆ Econometrics: Module-II Bedru B. and Seid H It should be, therefore, be clear that in order to adopt the necessary corrective measure (which is through transformation of the original data in such a way as to obtain a form in which the transformed disturbance terms possesses constant variance) we must have information on the form of heteroscedasticity. Also since our transformed data no more posses heteroscedasticity, it can be shown that the estimate of the transformed model are more efficient (i.e. they posses smaller variance) then the estimates obtained from the application of OLS to the original data. Let’s assume that a test reveals that original data possesses heteroscedasticity and that heteroscedasticity of the form σ i2 = K 2 X i2 is being assumed. Our original model is therefore: Yi = α + βX i + U i , Ε(U i ) = σ i2 = K 2 X 2 Apply OLS to the above heteroscedastic model βˆ = β + Σk i u i Σx 2 Σk u var(βˆ ) = Ε( βˆ − β ) 2 = Ε i 2 i Σx 2 2 = = Ε ( Σk i u i + 2 ΣX i X i u i u i ) (Σx ) 2 2 i Σxi2 Ε(u i ) 2 (Σx ) 2 2 i K 2 Σxi2 X 2 = ( Σx 2 ) 2 On transforming the original model we obtain: ˆ βˆ = Yi α Ui =β+ + Xi Xi Xi Y 1 −α Xi Xi ˆ var(βˆ ) = σ i2 Σ( 1 X )2 nΣ ( 1 X Since var(βˆ ) in OLS = K 2 Σ( 1 X ) = − ( 1 X ) 2 ) nΣ( 1 X − ( 1 X )) 2 2 σ u2 ΣX i2 n Σx 2 Econometrics: Module-II Bedru B. and Seid H 4.2 Autocorrelation 4.2.1 The nature of Autocorrelation In our discussion of simple and multiple regression models, one of the assumptions of the classicalist is that the cov(u i u j ) = Ε(u i u j ) = 0 which implies that successive values of disturbance term U are temporarily independent, i.e. disturbance occurring at one point of observation is not related to any other disturbance. This means that when observations are made over time, the effect of disturbance occurring at one period does not carry over into another period. If the above assumption is not satisfied, that is, if the value of U in any particular period is correlated with its own preceding value(s), we say there is autocorrelation of the random variables. Hence, autocorrelation is defined as a ‘correlation’ between members of series of observations ordered in time or space. There is a difference between ‘correlation’ and autocorrelation. Autocorrelation is a special case of correlation which refers to the relationship between successive values of the same variable, while correlation may also refer to the relationship between two or more different variables. Autocorrelation is also sometimes called as serial correlation but some economists distinguish between these two terms. According to G.Tinner, autocorrelation is the lag correlation of a given series with itself but lagged by a number of time units. The term serial correlation is defined by him as “lag correlation between two different series.” Thus, correlations between two time series such as u1 , u 2 .........u10 and u 2 , u 3 .........u11 , where the former is the latter series lagged by one time period, is autocorrelation. Whereas correlation between time series such as u1 , u 2 .........u10 and v 2 , v 2 .........v11 where U and V are two different time series, is called serial correlation. Although the distinction between the two terms may be useful, we shall treat these terms synonymously in our subsequent discussion. Econometrics: Module-II Bedru B. and Seid H 4.2.2 Graphical representation of Autocorrelation Since autocorrelation is correlation between members of series of observations ordered in time, we will see graphically the trend of the random variable by plotting time horizontally and the random variable (U i ) vertically. Consider the following figures Ui Ui Ui t t (a) t (b ) Ui (c) Ui t : : : : : : :: :: : : : : t ::::::::::::: (d) (e) The figures (a) –(d) above, show a cyclical pattern among the U’s indicating autocorrelation i.e. figures (b) and (c) suggest an upward and downward linear trend and (d) indicates quadratic trend in the disturbance terms. Figure (e) indicates no systematic pattern supporting non-autocorrelation assumption of the classical linear regression model. We can also show autocorrelation graphically by plotting successive values of the random disturbance term vertically (ui) and horizontally (uj). Econometrics: Module-II Bedru B. and Seid H The above figures f and g similarly indicates us positive and negative autocorrelation respectively while h indicates no autocorrelation. In general, if the disturbance terms follow systematic pattern as in (f) and (g) there is autocorrelation or serial correlation and if there is no systematic pattern, this indicates no correlation. 42.3 Reasons for Autocorrelation There are several reasons why serial or autocorrelation a rises. Some of these are: a. Cyclical fluctuations Time series such as GNP, price index, production, employment and unemployment exhibit business cycle. Starting at the bottom of recession, when Econometrics: Module-II Bedru B. and Seid H economic recovery starts, most of these series move upward. In this upswing, the value of a series at one point in time is greater than its previous value. Thus, there is a momentum built in to them, and it continues until something happens (e.g. increase in interest rate or tax) to slowdown them. Therefore, regression involving time series data, successive observations are likely to be interdependent. b. Specification bias This arises because of the following. i. Exclusion of variables from the regression model ii. Incorrect functional form of the model iii. Neglecting lagged terms from the regression model Let’s see one by one how the above specification biases causes autocorrelation. i. Exclusion of variables: as we have discussed in chapter one (module I), there are several sources of the random disturbance term (ui). One of these is exclusion of variable(s) from the model. This error term will show a systematic change as this variable changes. For example, suppose the correct demand model is given by: yt = α + β 1 x1t + β 2 x 2t + β 31 x3t + U t − − − − − − − − − − − − 3.21 where y = quantity of beef demanded, x1 = price of beef, x 2 = consumer income, x3 = price of pork and t = time. Now, suppose we run the following regression in lieu of (3.21): yt = α + β 1 x1t + β 2 x 2t + Vt − − − − − − − − − − − − ------3.22 Now, if equation 3.21 is the ‘correct’ model or true relation, running equation 3.22 is the tantamount to letting Vt = β 3 x3t + U t . And to the extent the price of pork affects the consumption of beef, the error or disturbance term V will reflect a systematic pattern, thus creating autocorrelation. A simple test of this would be to run both equation 3.21 and equation 3.22 and see whether autocorrelation, Econometrics: Module-II Bedru B. and Seid H if any, observed in equation 3.22 disappears when equation 3.21 is run. The actual mechanics of detecting autocorrelation will be discussed latter. ii. Incorrect functional form: This is also one source of the autocorrelation of error term. Suppose the ‘true’ or correct model in a cost-output study is as follows. Marginal cost= α 0 + β1output i + β 2 output i 2 + U i − − − − − − − − − − − − 3.23 However, we incorrectly fit the following model. M arg inal cos t i = α 1 + α 2 output i + Vi -------------------------------3.24 The marginal cost curve corresponding to the ‘true’ model is shown in the figure below along with the ‘incorrect’ linear cost curve. As the figure shows, between points A and B the linear marginal cost curve will consistently over estimate the true marginal cost; whereas, outside these points it will consistently underestimate the true marginal cost. This result is to be expected because the disturbance term Vi is, in fact, equal to (output)2+ ui, and hence will catch the systematic effect of the (output)2 term on the marginal cost. In this case, Vi will reflect autocorrelation because of the use of an incorrect functional form. iii. Neglecting lagged term from the model: - If the dependent variable of a certain regression model is to be affected by the lagged value of itself or the Econometrics: Module-II Bedru B. and Seid H explanatory variable and is not included in the model, the error term of the incorrect model will reflect a systematic pattern which indicates autocorrelation in the model. Suppose the correct model for consumption expenditure is: C t = α + β 1 y t + β 2 y t −1 + U t -----------------------------------3.25 but again for some reason we incorrectly regress: C t = α + β 1 y t + Vt ---------------------------------------------3.26 As in the case in (3.21) and (3.22); Vt = β 2 y t −1 + U t Hence, Vt shows systematic change reflecting autocrrelation. 42.4 Matrix representation of autocorrelation The variance-covariance matrix of the error terms developed in chapter two (module I) is: Ε(u i2 ) Ε(u1u 2 )............ Ε(u1u n ) Ε(u u ) Ε(u 2 )................ Ε(u u ) 2 1 2 n 2 Ε(UU ' ) = : : : Ε(u u ) Ε(u u )............. Ε(u 2 ) n 1 n 2 n In the case of the assumption of non-autocorrelation and homoscedasticity. Ε(UU ' ) = σ2 0 0 : 0 σ2 : 0 ........ ........ 1 0 ........ 0 = σ 2 0 1 ........ 0 = σ 2 I --------3.27 n : : : 1 σ 2 0 0 0 0 : The assumption of no autocorrelation is responsible for the appearance of zero offthe diagonals, whereas the assumption of homoscedasticity establishes the equality of diagonal terms. The Following three examples of variance-covariance matrices help to understand the concept of autocorrelation and hetroscedasticity. Econometrics: Module-II Bedru B. and Seid H 3 0 0 0 5 0 0 0 3 Hetroscedasticity with no autocorrelation 1 1 2 1 1 2 1 1 2 0 1 2 1 3 1 2 1 4 1 2 1 1 2 2 2 1 4 1 Homoscedasticity Hetroscedasticity with with autocorrelation autocorrelation 4.2.5 The coefficient of autocorrelation Autocorrelation, as stated earlier, is a kind of lag correlation between successive values of same variables. Thus, we treat autocorrelation in the same way as correlation in general. A simple case of linear correlation is termed here as autocorrelation of first order. In other words, if the value of U in any particular period depends on its own value in the preceding period alone, we say that U’s follow a first order autoregressive scheme AR(1) (or first order Markove scheme) i.e. u t = f (u t −1 ) . ------------------------- - -------------3.28 If ut depends on the values of the two previous periods, then: u t = f (u t −1 , u t − 2 ) ---------------------------------- 3.29 This form of autocorrelation is called a second order autoregressive scheme and so on. Generally when autocorrelation is present, we assume simple first form of autocorrelation: ut = f(ut-1) and also in the linear form: u t = ρu t −1 + vt --------------------------------------------3.30 where ρ the coefficient of autocorrelation and V is a random variable satisfying all the basic assumption of ordinary least square. Ε(v 2 ) = σ v2 Ε(v) = 0, and Ε (v i v j ) = 0 for i ≠ j The above relationship states the simplest possible form of autocorrelation; if we apply OLS on the model given in ( 3.30) we obtain: n ∑u u t ρ̂ = t −1 t =2 n ∑u --------------------------------3.31 2 t −1 t =2 Econometrics: Module-II Bedru B. and Seid H Given that for large samples: Σu t2 ≈ Σu t2−1 , we observe that coefficient of autocorrelation ρ represents a simple correlation coefficient r. n ρˆ = ∑ ut u t −1 t =2 n ∑u t =2 2 t −1 n = n ∑ ut ut −1 ∑ u 2 t −1 t =2 n ∑u u t t =2 2 = t −1 t =2 Σu t2 Σu t2−1 = rut u t −1 (Why?)---------------------3.32 ⇒ −1 ≤ ρˆ ≤ 1 since − 1 ≤ r ≤ 1 ---------------------------------------------3.33 This proves the statement “we can treat autocorrelation in the same way as correlation in general”. From our statistics background we know that: if the value of r is 1 we call it perfect positive correlation, if r is -1 , perfect negative correlation and if the value of r is 0 ,there is no correlation. By the same analogy if the value of ρ̂ is 1 it is called perfect positive autocorrelation, if ρ̂ is -1 it is called perfect negative autocorrelation and if ρ = 0 , no autocorrelation. If ρ̂ =0 in u t = ρu t − 1 + v t i.e. u t is not correlated. 4.2.6 Mean, Variance and Covariance of Disturbance Terms in Autocorrelated Model: To examine the consequences of autocorrelation on ordinary least square estimators, it is required to study the properties of U. If the values of U are found to be correlated with simple markove process, then it becomes: U t = ρu t −1 + vt with / ρ / ≤ 1 vt fulfilling all the usual assumptions of a disturbance term. Our objective, here is to obtain value of u t in terms of autocorrelation coefficient ρ and random variable vt . The complete form of the first order autoregressive scheme may be discussed as under: Econometrics: Module-II Bedru B. and Seid H U t = f (U t −1 ) = ρU t −1 + vt U t −1 = f (U t − 2 ) = ρU t − 2 + vt −1 U t − 2 = f (U t −3 ) = ρU t −3 + vt − 2 U t − r = f (U t −( r +1) ) = ρU t − ( r +1) + vt − r We make use of above relations to perform continuous substitutions in U t = ρu t −1 + vt as follows. U t = ρU t −1 + vt = ρ ( ρU t − 2 + vt −1 ) + vt , u t −1 = ρU t − 2 + vt −1 = ρ 2U t − 2 + ρvt −1 + vt = ρ 2 ( ρU t −3 + vt −3 ) + ( ρvt −1 + vt ) U t = ρ 3U t −3 + ρ 2 vt −3 + ρvt −1 + vt In this way, if we continue the substitution process for r periods (assuming that r is very large), we shall obtain: U t = vt + ρvt −1 + ρ 2 vt − 2 + ρ 3 vt −3 + − − − − − − − − -------------3.35 ρ r → 0 since / ρ / ≤ 1 ∞ u t = ∑ ρ r vt − r -----------------------------------------------------------3.36 r =0 Now, using this value of u t , let’s compute its mean, variance and covariance 1. To obtain mean ∞ Ε(U t ) = Ε ∑ ρ r vt − r = Σρ r Ε(vt − r ) = 0 since Ε(vt − r ) = 0 ----------3.37 r =0 In other words, we found that the mean of autocorrelated U’s turns out to be zero. 2. To obtain variance By the definition of variance 2 ∞ ∞ ∞ Ε(U ) = Ε ∑ ρ r vt − r = ∑ ( ρ r ) 2 Ε(vt − r ) 2 = ∑ ( ρ r ) 2 var(Vt − r ) ;since r =0 r =0 r =0 2 var(vt − r ) = E (Vt − r ) 2 i ∞ 1 = ∑ ρ 2 r σ 2 = σ 2 (1 + ρ 2 + ρ 4 + ρ 6 + ................ + ∞) = ρ 2 2 r =0 1 − ρ σ2 --------------------------------(3.38) ; Since / ρ / < 1 (1 − ρ 2 ) σ2 Thus, variance of autocorrelated u i is which is constant value. 1− ρ 2 var(U t ) = Econometrics: Module-II Bedru B. and Seid H From the above, the variance of Ui depends on the nature of variance of Vi. If the variance of Vi is homoscedaistic, Ui is homomscedastic and if Vi is hetroscedastic, Ui is hetroscedastic. 3. To obtain covariance: By the definition of covariance: = E (U tU t −1 ) ------------------------------------------------------------------------(3.39 ) Since u t = vt + ρvt −1 + ρ 2 vt −2 + ........ ∴U t −1 = vt −1 + ρvt − 2 + ρ 2 vt −3 + ........ Substituting the above two equations in equation 3.39, we obtain cov(U tU t −1 ) = Ε(vt + ρvt −1 + ρ 2 vt − 2 + ........)(vt −1 + ρvt − 2 + ρ 2 vt −3 + ........) = Ε{vt + ρ (vt −1 + ρvt − 2 + ........)}(vt −1 + ρvt − 2 + ρ 2 vt −3 + ........) = Ε[vt (vt −1 + ρvt − 2 + ........) + Ε( ρ (vt −1 + ρvt − 2 + ........) 2 ] ; since E (vt vt − r ) = 0 = 0 + Ε( ρ (vt −1 + ρvt − 2 + ........) 2 ) = Ε( ρ (vt −1 + ρvt − 2 + ........) 2 ) = ρΕ(vt −1 + ρ 2 vt − 2 + ...... + 2 times cross products) 2 2 = ρ (σ v2 + ρ 2σ v2 + ...... + 0) = ρ (σ v2 (1 + ρ 2 + ρ 4 + ......) ρσ 2 since ρ < 1 --------------------------------------------------------3.40 1− ρ 2 ρσ v2 ∴ cov(U t , U t −1 ) = = ρσ u2 ……………………………………………….3.41 1− ρ 2 Similarly cov(u t , u t −2 ) = ρ 2σ u2 ………………………………………….3.42 cov(U t , U t −3 ) = ρ 3σ u2 ….........................................................................3.43 = and generalizing cov(U t ,U t − s ) = ρ sσ u2 (for s ≠ t ) . Summarizing on the bases of the preceding discussions, we find that when ut’s are autocorrelated, then: σ2 U t ~ N 0, v 2 and; E ( U tU t − r ) ≠ 0 --------------------------------3.44 1- ρ Econometrics: Module-II Bedru B. and Seid H 4.2.7 Effect of Autocorrelation on OLS Estimators. We have seen that ordinary least square technique is based on basic assumptions. Some of the basic assumptions are with respect to mean, variance and covariance of disturbance term. Naturally, therefore, if these assumptions do not hold good on what so ever account, the estimators derived by OLS procedure may not be efficient. Now, we are in a position to examine the effect of autocorrelation on OLS estimators. Following are effects on the estimators if OLS method is applied in presence of autocorrelation in the given data. 1. OLS estimates are unbiased We know that: β̂ = β + Σk i u i Ε( βˆ ) = β + Σk i Ε(u i ) ⇒ We proved Ε(u i ) = 0 -- from (3.37). Therefore, Ε( βˆ ) = β 2. The variance of OLS estimates is inefficient. The variance of estimate β̂ in simple regression model will be biased down wards (i.e. underestimated) when u’s are auto correlated. It can be shown as follows. β̂ = β + Σk i u i ; ⇒ βˆ − β = Σk i wi We know that: Var ( βˆ ) = Ε( βˆ − β ) 2 = Ε(Σk i u i ) 2 2 2 = Ε(k1u1 + k 2 u 2 + ...... + k n u n ) 2 = Ε(k1 u1 + k 22 u 22 + ....... + k n2 u n2 + 2k1k 2 u1u 2 + .... + 2k n −1 k n u n −1u n ) = Ε(∑ k i u i + 2Σk i k j u i u j ) 2 2 2 = Σk i Ε(u i ) 2 + 2Σk i k j Ε(u i u j ) If Ε(u i u j ) = 0 which means if there is no autocorrelation, the last term disappears so that: var(βˆ ) = σ u2 Σk i2 = σ u2 Σx 2 However, we proved that Ε(u t u t − s ) ≠ 0 but equal to ρ sσ u2 xi x j σ2 ∴Var ( βˆ ) = 2 + 2σ u2 Σ ρ 2 ---------------------------3.45 Σx (Σxi2 ) 2 In the absence of autocorrelation Var ( βˆ ) = σ2 Σx 2 Econometrics: Module-II Bedru B. and Seid H But in the presence of autocorrelation var(βˆ ) auto = var(βˆ ) nonauto + 2σ u2 Σ xi x j (Σx ) 2 2 i ρ s ----------------------------3.46 -If ρ is positive and x is positively correlated Var ( βˆ ) auto > Var ( βˆ ) nonauto . σ2 The implication is if wrongly use Var ( βˆ ) = 2 while the data is autocorrelated. Σx var(β ) is underestimated because if the data is autocorrelated the true variance is xi x j σ2 σ2 2 2 + 2 σ Σ ρ not . u Σx 2 (Σxi2 ) 2 Σx 2 In the case the explanatory variable X of the model is random, the covariance of successive values is zero (Σxi x j = 0) , under such circumstance the bias in var(β ) will not be serious even though u is autocorrelated. 3. Wrong Testing Procedure If var(βˆ ) is underestimated, SE ( βˆ ) is also underestimated, this makes t-ratio large. This large t-ratio may make β̂ statistically significant while it may not. 4. Wrong testing procedure will make wrong prediction and inference about the characteristics of the population. 4.2.8 Detection (Testing) of Autocorrelation There are two methods that are commonly used to detect the existence or absence of autocorrelation in the disturbance terms. These are: 1. Graphic method Dear distance student, you recalled from section 3.2.2 that autocorrelation can be presented in graphs in two ways. Detection of autocorrelation using graphs will be based on these two ways. Given a data of economic variables, autocorrelation can be detected in this data using graphs in the following two procedures. Econometrics: Module-II Bedru B. and Seid H a. Apply OLS to the given data whether it is auto correlated or not and obtain the error terms. Plot et horizontally and et −1 vertically. i.e. plot the following observations (e1 , e2 ), (e2 , e3 ), (e3 , e4 ).......(en , en +1 ) .If on plotting, it is found that most of he points fall in quadrant I and III, as shown in fig (a) below, we say that the given data is autocorrelated and the type of autocorrelation is positive autocorrelation. If most of the points fall in quadrant II and IV, as shown in fig (b) below the autocorrelatioin is said to be negative. But if the points are scattered equally in all the quadrants as shown in fig (c) below, then we say there is no autocorrelation in the given data. Econometrics: Module-II Bedru B. and Seid H 2. Formal testing method This method is called formal because the testis based on the formal testing procedure you have seen in your statistics course. It is based on either the z-test, ttest, F-test or X2 test. If a test applies any of the above, it is called formal testing method. Different econometricians and statisticians suggest different types of testing methods. But, the most frequently and widely used testing methods by researchers are the following. A. Run test: Before going to the detail analysis of this method, let us define what a run is in this context. Run is the number of positive and negative signs of the error term arranged in sequence according to the values of the explanatory variables, like “++++++++-------------++++++++------------++++++” By examining how runs behave in a strictly random sequence of observations one can derive a test of randomness of runs. We ask this question: are the observed runs too many or too few compared with number of runs expected? If there are too many runs, if would mean the Uˆ ' s change sign frequently, thus indicating negative serial correlation. Similarly, if there are too few runs, they may suggest positive autocorrelation. Now let: n = total number of observations = n1 + n2 n1 = number of + symbols; n2 = number of – symbols; and k = number of runs Under the null hypothesis that successive outcomes (here, residuals) are independent, and assuming that n1 > 10 and n2 > 10 , the number of runs is distributed (asymptotically) normally with: Mean: Ε(k ) = 2n1 n 2 +1 n1 + n 2 Variance: σ k2 = 2n1 n2 (2n1n 2 − n1 − n 2 ) (n1 + n 2 ) 2 (n1 + n 2 − 1) Econometrics: Module-II Bedru B. and Seid H Decision rule: Do not reject the null hypothesis of randomness or independence with 95% confidence. If [Ε(k ) − 1.96σ k ≤ k ≤ 1.96σ k ]; reject the null hypothesis if the estimated k lies outside these limits. In a hypothetical example of n1 = 14, Ε(k ) = 16.75, n2 = 18 and k = 5 we obtain σ k2 = 7.49395 ⇒ σ k = 2.7375 Hence the 95% confidence interval is: [16.75 ± 1.96(2.7375)] = [11.3845,22.1155] since k=5, it clearly falls outside this interval. There fore we can reject the hypothesis that the observed sequence of residuals is random (are of independent) with 95% confidence. B. The Durbin-Watson d test: The most celebrated test for detecting serial correlation is one that is developed by statisticians Durbin and Waston. It is popularly known as the Durbin-Waston d statistic, which is defined as: t =n d= ∑ (e t − et −1 ) 2 t =2 t =n ∑e ------------------------------------3.47 2 t t =1 Note that, in the numerator of d statistic the number of observations is n − 1 because one observation is lost in taking successive differences. It is important to note the assumptions underlying the d-statistics 1. The regression model includes an intercept term. If such term is not present, as in the case of the regression through the origin, it is essential to rerun the regression including the intercept term to obtain the RSS. Econometrics: Module-II Bedru B. and Seid H 2. The explanatory variables, the X’s, are non-stochastic, or fixed in repeated sampling. 3. The disturbances U t are generated by the first order auto regressive scheme: Vt = ρu t −1 + ε t 4. The regression model does not include lagged value of Y the dependent variable as one of the explanatory variables. Thus, the test is inapplicable to models of the following type yt = β1 + β 2 X 2t + β 3 X 3t + ....... + β k X kt + ry t −1 + U t Where y t −1 the one period lagged value of y is such models are known as autoregressive models. If d-test is applied mistakenly, the value of d in such cases will often be around 2, which is the value of d in the absence of first order autocorrelation. Durbin developed the so-called h-statistic to test serial correlation in such autoregressive. 5. There are no missing observations in the data. In using the Durbin –Watson test, it is, there fore, to note that it can not be applied in violation of any of the above five assumptions. t =n Dear distance student, from equation 3.47 the value of d = ∑ (e t − et −1 ) 2 t =2 t =n ∑e 2 t t =1 Squaring the numerator of the above equation, we obtain n n ∑e + ∑e 2 t d= t =2 2 t −1 − 2Σet et −1 t =2 ------------------3.48 Σet2 n However, for large samples ∑e t =2 n 2 t ≅ ∑ et2−1 because in both cases one t =2 observation is lost. Thus, Econometrics: Module-II Bedru B. and Seid H n d= 2∑ et2 t =2 Σe 2 t −+ 2Σet et −1 Σe t Σet et −1 d ≈ 2 1− n et ∑ t =1 but ρ = Σet et −1 from equation Σet d = 2(1 − ρˆ ) From the above relation, therefore ρˆ = 0, d ≅ 2 if ρˆ = 1, d ≅ 0 ρˆ = −1, d ≅ 4 Thus we obtain two important conclusions i. Values of d lies between 0 and 4 ii. If there is no autocorrelation ρˆ = 0, then d = 2 Whenever, therefore, the calculated value of d turns out to be sufficiently close to 2, we accept null hypothesis, and if it is close to zero or four, we reject the null hypothesis that there is no autocorrelation. However, because the exact value of d is never known, there exist ranges of values with in which we can either accept or reject null hypothesis. We do not also have unique critical value of d-stastics. We have d L -lower bound and d u upper bound of he initial values of d to accept or reject the null hypothesis. For the two-tailed Durbin Watson test, we have set five regions to the values of d as depicted in the figure below. Econometrics: Module-II Bedru B. and Seid H The mechanisms of the D.W test are as follows, assuming that the assumptions underlying the tests are fulfilled. Run the OLS regression and obtain the residuals Obtain the computed value of d using the formula given in equation 3.47 For the given sample size and given number of explanatory variables, find out critical d L and d U values. Now follow the decision rules given below. 1. If d is less that d L or greater than (4 − d L ) we reject the null hypothesis of no autocorrelation in favor of the alternative which implies existence of autocorrelation. 2. If, d lies between d U and (4 − d U ) , accept the null hypothesis of no autocorrelation 3. If how ever the value of d lies between d L and d U or between (4 − d U ) and (4 − d L ) , the D.W test is inconclusive. Example 1. Suppose for a hypothetical model Y = α + β X + U i ,if we found d = 0.1380 ; d L = 1.37; d U = 1.50 Based on the above values test for autocorrelation Solution: First compute (4 − d L ) and (4 − d U ) and compare the computed value of d with d L , d U , (4 − d L ) and (4 − d U ) (4 − d L ) =4-1.37=2.63 (4 − d U ) =4-1.5=2.50 Since d is less than d L we reject the null hypothesis of no autocorrelation Example 2. Consider the model Yt = α + βX t + U t with the following observation on X and Y X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Y 2 2 2 1 3 5 6 6 10 10 10 12 15 10 11 Econometrics: Module-II Bedru B. and Seid H Test for autocorrelation using Durbin -Watson method Solution: 1. regress Y on X: i.e. Yt = α + βX t + U t : From the above table we can compute the following values. Σxy = 255, Y = 7, Σ(ei − et −1 ) 2 = 60.21 Σx 2 = 280, X = 8, Σet2 = 41.767 Σy 2 = 274 βˆ = Σxy 255 = = 0.91 Σx 2 280 αˆ = Y − βˆX = 7 − 0.91(8) = −0.29 Y = −0.29 + 0.91X + U i Yˆ = 0.28 + 0.91X , R 2 = 0.85 Σ(et − et −1 ) 2 60.213 d= = = 1.442 41.767 Σet2 Values of d L and d U on 5% level of significance, with n=15 and one explanatory variable are: d L =1.08 and d U =1.36. (4 − d u ) = 2.64 d U < d < 4 − d U = (1.364 2.64) d * = 1.442 Since d* lies between dU < d < 4 − dU , accept H0. This implies the data is autocorrelated. Although D.W test is extremely popular, the d test has one great drawback in that if it falls in the inconclusive zone or region, one cannot conclude whether autocorrelation does or does not exist. Several authors have proposed modifications of the D.W test. Econometrics: Module-II Bedru B. and Seid H In many situations, however, it has been found that the upper limit d U is approximately the true significance limit. Thus, the modified DW test is based on d U in case the estimated d value lies in the inconclusive zone, one can use the following modified d test procedure. Given the level of significance α ; if 1. ρ = 0 versus H 1 : ρ > 0 if the estimated d < dU , reject H0 at level α , that is there is statistically significant positive correlation. 2. H 0 : ρ = 0 versus H 1 : ρ ≠ 0 if the estimated d < dU or (4 − d u ) < dU reject H0 at level 2α statistically there is significant evidence of autocorrelation, positive or negative. 4.2.9 Remedial Measures for the problems of Autocorrelation Since in the presence of serial correlation the OLS estimators are inefficient, it is essential to seek remedial measures. The remedy however depends on what knowledge one has about the nature of interdependence among the disturbances. : This means the remedy depends on whether the coefficient of autocorrelation is known or not known. A. when ρ is known- When the structure of autocorrelation is known, i.e ρ is known, the appropriate corrective procedure is to transform the original model or data so that error term of the transformed model or data is non auto correlated. When we transform, we are wippiy of the effect of ρ . Suppose that our model is Yt = α + β X t + U t − − − − − − − − 3.49 and U t = ρU t −1 + Vt , | ρ |< 1 − − − − 3.50 Econometrics: Module-II Bedru B. and Seid H Equation (3.50) indicates the existence of autocorrelation. If ρ is known, then transform Equation (3.49) into one that is not autocorrelated. The procedure of transformation will be given below. Take the lagged form of equation (1) and multiply through by ρ .. ρ y = ρα + ρβ X t −1 + ρU t −1 − − − − − − − − 3.51 t −1 Subtracting (3) from (1), we have: Yt − ρYt −1 = (α − ρα ) + ( β X t − ρβX t −1 ) + (U t − ρU t −1 ) − − − − − 3.52 By rearranging the terms in (3.50), we have Vt = U t − ρU t −1 which on substituting the last term of (3.52) gives Yt − ρYt −1 = (α − ρα ) + β ( X t − ρX t −1 ) + vt − − − − − − − 3.53 Let: Yt* = Y − ρy t −1 a = α − ρα = α (1 − ρ ) X t* = X t − ρX t −1 Equation (3.53) may be written as: Yt* = a + BX t* + vt − − − − − − − −(3.54) It may be noted that in transforming Equation (3.49) into (3.54) one observation shall be lost because of lagging and subtracting in (3.52). We can apply OLS to the transformed relation in (3.54) to obtain αˆ and βˆ for our two parameters α and β . αˆ = aˆ and it can be shown that 1− ρ 2 1 var αˆ = var(aˆ ) 1− ρ Econometrics: Module-II Bedru B. and Seid H Because α̂ is perfectly and linearly related to â . Again since vt satisfies all standards assumptions, the variance of αˆ and βˆ would be given by our standard OLS formulae. var(αˆ ) = σ u2 ΣX t2 * n n∑ ( X t* − X ) 2 , var(βˆ ) = σ u2 n ∑ (ΣX * t − X t* ) 2 ti Estimators obtained in equation 6 are efficient, only if our sample size is large so that loss of one observation becomes negligible. B. When ρ is not known When ρ is not known, we will describe the methods through which the coefficient of autocorrelation can be estimated. Method I: A priori information on ρ Many times an investigator makes some reasonable guess about the value of autoregressive coefficient by using his knowledge or institution about the relationship under study. Many researchers usually assume that ρ =1 or -1. Under this method, the process of transformation is the same as when ρ is known. When ρ =1, the transformed model becomes; (Yt − Yt −1 ) = ( X t − X t −1 ) + Vt ; where Vt = U t − U t −1 Note that the constant term is suppressed in the above. B̂ is obtained by taking merely the first differences of the variable and obtaining line that passes through the origin. Suppose that one assumes ρ =-1 instead of ρ =1, i.e the case of perfect negative autocorrelation. In such a case, the transformed model becomes: Yt + Yt −1 = 2α + β ( X t + X t −1 ) + vt Or Yt + Yt −1 ( X t + X t −1 ) vt =α +β + 2 2 2 Econometrics: Module-II Bedru B. and Seid H This model is then called two period moving average regression model because Yt + Yt −1 on 2 actually we are regressing the value of one moving average ( X t + X t −1 ) 2 another This method of first difference in quite popular in applied research for its simplicity. But the method rests on the assumption that either there is perfect positive or perfect negative autocorrelation in the data. Method II: Estimation of ρ from d-statistic: From equation ( 3.47 ), we obtained d ≈ 2(1 − ρˆ ) . Suppose we calculate certain value of d-statistic in the case of certain d ≈ 2(1 − ρˆ ) data. Given the d-value we can estimate ρ from this. 1 ⇒ ρˆ ≈ 1 − d 2 As already pointed out, ρ̂ will not be accurate if the sample size is small. The above relationship is true only for large samples. For small samples, Theil and Nagar have suggested the following relation: ρˆ = n 2 (1 − d 2 ) + k 2 ………………………………………………..3.55 n2 − k 2 where n=total number of observation; d= Durbin Watson statistic ; k=number of coefficients (including intercept term). Using this value of ρ̂ we can perform the above transformation to avoid autocorrelation from the model. Method III: The Cochrane-Orcutt iterative procedure: In this method, we remove autocorrelation gradually starting from the simplest form of a first order scheme. First we obtain the residuals and apply OLS to them; et = ρet −1 + vt …………………………………………………….3.56 We estimate ρ̂ from the above relation. With the estimated ρ̂ , we transform the Econometrics: Module-II Bedru B. and Seid H original data and then apply OLS to the model. (Yt − ρˆYt −1 ) = α (1 − ρˆ ) + β ( X t − ρˆX t −1 ) + Vt − ρˆu t −1 ……………......…3.57 we once again apply OLS to the newly obtained residuals et* = ρet*−1 + wt ……………………………………………………………3.58 We use this second estimate ρ̂ˆ to transform the original observations and so on we keep proceeding until the value of the estimate of ρ converges. It can be shown that the procedure is convergent. When the data is transformed only by using this second stage estimate of ρ , it is then called two stages Cochrane-Orcutt method. However one can follow an alternative approach to use at each step of interaction, the Durbin Watson d-statistic to residuals for autocorrelation or till the estimates of ρ do not differ substantially from one another. Method IV: Durbin’s two-stage method: Assuming the first order autoregressive scheme, Durbin suggests a two-stage procedure for resolving the serial correlation problem. The steps under this method are: Given Yt = α + β X t + u t -----------------------------------(3.59) U t = ρU t −1 + vt 1. Take the lagged term of the above and multiply by ρ ρYt −1 = ρα + ρβX t −1 + ρu t −1 --------------------------(3.60) 2. Subtract (3.60) from (3.59) Yt − ρYt −1 = α (1 − ρ ) + β ( X t − ρX t −1 ) + u t − ρu t −1 ------(3.61) 3. Rewrite (3.61) in the following form Yt = α (1 − ρ ) + ρYt −1 + β X t − βρ X t −1 + vt Yt = α * + ρYt −1 + β X t − γX t −1 + vt Econometrics: Module-II Bedru B. and Seid H This equation is now treated as regression equation with three explanatory variables X t , X t −1 and Yt −1 . This provides estimate of ρ which is used to construct new variables (Yt − ρˆYt −1 ) and ( X t − ρˆX t −1 ). In the second step, estimators of α and β are obtained from the regression equation: (Yt − ρˆYt −1 ) = α * + β ( X t − ρˆX t −1 ) + u t* ; where α * = α (1 − ρ ) 4.3 Multicollinearity 4.3.1 The nature of Multicollinearity Originally, multicollinearity meant the existence of a “perfect” or exact, linear relationship among some or all explanatory variables of a regression model. For kvariable regression involving explanatory variables x1 , x 2 ,......, x k , an exact linear relationship is said to exist if the following condition is satisfied. λ1 x1 + λ 2 x 2 + ....... + λ k x k + vi = 0 − − − − − − (1) where λ1 , λ 2 ,.....λ k are constants such that not all of them are simultaneously zero. Today, however , the term multicollineaity is used in a broader sense to include the case of perfect multicollinearity as shown by (1) as well as the case where the x-variables are inter-correlated but not perfectly so as follows λ1 x1 + λ 2 x 2 + ....... + λ 2 x k + vi = 0 − − − − − − (1) where vi is the stochastic error term. The nature of multicollinarity can be illustrated using the figures below. Let, in the figures y, x1 , & x 2 and represent respectively the variation in y (the dependent variable) and x1and x2 (explanatory variables). The degree of collinearity can be measured by the extent of overlap (shaded area) of the x1 and x2. In the fig.(a) below there is no overlap between x1 and x 2 and hence no collinearity. In figs. ‘b’ through ‘e’, there is “low” to “high” degree of collinearity. In the extreme if x1 and x 2 were to overlap completely (or if x1 is completely inside x2, or vice versa), Collinearity would be perfect. Econometrics: Module-II Bedru B. and Seid H Note that: multicollinearity refers only to linear relationships among the xvariables. It does not rule out non-linear relationships among the x-variables. For example: Y = α + β 1 xi + β 1 xi2 + β1 xi3 + vi − − − − − − (3.31) Where: Y-Total cost and X-output. The variables xi2 and xi3 are obviously functionally related to xi but the relationship is non-linear. Strictly, therefore, models such (3.31) do not violate the assumption of no multicollineaity. However, in concrete applications, the conventionally measured correlation coefficient will show xi , xi2 and xi3 to be highly correlated, which as we shall show, will make it difficult to estimate the parameters with greater precision (i.e. with smaller standard errors). 4.3.2 Reasons for Multicollinearity 1. The data collection method employed: Example: If we regress on small sample values of the population; there may be multicollinearity but if we take all the possible values, it may not show multicollinearity. 2. Constraint over the model or in the population being sampled. Econometrics: Module-II Bedru B. and Seid H For example: in the regression of electricity consumption on income (x1) and house size (x2), there is a physical constraint in the population in that families with higher income generally have larger homes than with lower incomes. 3. Overdetermined model: This happens when the model has more explanatory variables than the number of observations. This could happen in medical research where there may be a small number of patients about whom information is collected on a large number of variables. 4.3.3 Consequences of Multicollinearity Why does the classical linear regression model put the assumption of no multicollinearity among the X’s? It is because of the following consequences of multicollinearity on OLS estimators. 1. If multicollinearity is perfect, the regression coefficients of the X variables are indeterminate and their standard errors are infinite. Proof: - Consider a multiple regression model with two explanatory variables, where the dependent and independent variables are given in deviation form as follows. y i = βˆ 1 x 1 i + βˆ 2 x 2 i + e i Dear distance student, do you recall the formulas of β̂1 and β̂ 2 from our discussion of multiple regression? βˆ 1 = Σ x 1 y Σ x 22 − Σ x 2 y Σ x 1 x 2 Σ x 12i Σ x 22 − ( Σ x 1 x 2 ) 2 βˆ 1 = Σ x 2 y Σ x 12 − Σ x 1 y Σ x 1 x 2 Σ x 12 Σ x 22 − ( Σ x 1 x 2 ) 2 Assume x 2 = λx1 ------------------------3.32 Where λ is non-zero constants. Substitute 3.32in the above β̂1 and β̂ 2 formula: βˆ 1 Σ x1 yΣ (λ x1 ) 2 − Σ λ x1 yΣ x1λ x1 = Σ x 12i Σ ( λ x 1 ) 2 − ( Σ x 1 λ x 1 ) 2 Econometrics: Module-II Bedru B. and Seid H = Σ x1 yλ 2 Σ x1 λ (Σ x 2 2 1 ) 2 2 − λ Σ x1 yΣ x1 − λ 2 2 (Σ x1 ) 2 2 = 0 0 ⇒ indeterminate. Applying the same procedure, we obtain similar result (indeterminate value) for β̂ 2 . Likewise, from our discussion of multiple regression model, variance of β̂1 is given by : var(βˆ1 ) = σ 2 Σx 22 Σx12 Σx12 − (Σx1 x 2 ) 2 Substituting x 2 = λx1 in the above variance formula, we get: σ 2 λ 2 Σx12 = 2 λ (Σx12 ) 2 − λ2 (Σx12 ) 2 = σ 2 λ2 Σx12 0 =∞ ⇒ infinite. These are the consequences of perfect multicollinearity. One may raise the question on consequences of less than perfect correlation. In cases of near or high multicollinearity, one is likely to encounter the following consequences. 2. If multicollineaity is less than perfect (i.e near or high multicollinearity), the regression coefficients are determinate Proof: Consider the two explanatory variables model above in deviation form. If we assume x 2 = λx1 it indicates us perfect correlation between x1 and x 2 because the change in x2 is completely because of the change in x1.Instead of exact multicollinearity, we may have: x 2i = λx1i + vt Where λ ≠ 0, vt is stochastic error term such that Σxi vi = 0 . In this case x2 is not only determined by x1,but also affected by some other variables given by vi (stochastic error term). Substitute x 2 = λx1i + vt in the formula of β̂1 above βˆ1 = Σx1 yΣx 22 − Σx 2 yΣx1 x 2 Σx12i Σx 22 − (Σx1 x 2 ) 2 Econometrics: Module-II Bedru B. and Seid H Σx1 yλ2 Σx12 + Σvi2 − (λΣyi x1i + Σy i vi )λΣx12i = Σx 22i (λ2 Σx 22i + Σvi2 ) − (λΣx12 ) 2 ≠ 0 ⇒ determinate. 0 This proves that if we have less than perfect multicollinearity the OLS coefficients are determinate. The implication of indetermination of regression coefficients in the case of perfect multicolinearity is that it is not possible to observe the separate influence of x1 and x 2 . But such extreme case is not very frequent in practical applications. Most data exhibit less than perfect multicollinearity. 3. If multicollineaity is less than perfect (i.e. near or high multicollinearity) , OLS estimators retain the property of BLUE Explanation: Note: While we were proving the BLUE property of OLS estimators in simple and multiple regression models(module-I); we did not make use of the assumption of no multicollinearity. Hence, if the basic assumptions which are important to prove the BLUE property are not violated ,whether multicollinearity exist or not ,the OLS estimators are BLUE . 3. Although BLUE, the OLS estimators have large variances and covariances. var(βˆ 2 ) = σ 2 Σx 22 Σx12 Σx 22 − (Σx1 x 2 ) 2 Multiply the numerator and the denominator by σ 2 Σx 22 . var(βˆ 2 ) = (Σx Σx 2 1 = 2 2 1 2 Σx 2 ) − (Σx1 x 2 ) 2 . 1 2 Σx 2 Σx 2 Σx12 − 2 σ2 = σ2 (Σx1 x 2 ) 2 Σx12 − 1 − Σx12 Σx12 1 = (Σx1 x 2 ) 2 Σx12 σ2 Σx12 − (1 − r122 ) Where r122 is the square of correlation coefficient between x1 and x 2 , Econometrics: Module-II Bedru B. and Seid H If x 2 = x1i + vi , what happen to the variance of β̂1 as r122 is line rises. As r12 tends to 1 or as collinearity increases, the variance of the estimator increase and in the limit when r12 = 1 variance of β̂1 becomes infinite. Similarly cov(β 1 , β 2 ) = − r12σ 2 (1 − r122 ) Σx12 Σx12 . (why?) As r12 increases to ward one, the covariance of the two estimators increase in absolute value. The speed with which variances and covariance increase can be seen with the variance-inflating factor (VIF) which is defined as: VIF = 1 1 − r122 VIF shows how the variance of an estimator is inflated by the presence of multicollinearity. As r122 approaches 1, the VIF approaches infinity. That is, as the extent of collinearity increase, the variance of an estimator increases and in the limit the variance becomes infinite. As can be seen, if there is no multicollinearity between x1 and x 2 , VIF will be 1. Using this definition we can express var(β1 ) and var(βˆ 2 ) interms of VIF σ2 σ2 ˆ ˆ var(β 1 ) = 2 VIF and var(β 2 ) = 2 VIF Σx1 Σx 2 Which shows that variances of βˆ1 and βˆ 2 are directly proportional to the VIF. 4. Because of the large variance of the estimators, which means large standard errors, the confidence interval tend to be much wider, leading to the acceptance of “zero null hypothesis” (i.e. the true population coefficient is zero) more readily. 5. Because of large standard error of the estimators, the computed t-ratio will be very small leading one or more of the coefficients tend to be statistically insignificant when tested individually. Econometrics: Module-II Bedru B. and Seid H 6. Although the t-ratio of one or more of the coefficients is very small (which makes the coefficients statistically insignificant individually), R2, the overall measure of goodness of fit, can be very high. Example: if y = α + β 1 x1 + β 2 x 2 + .... + β k x k + vi In the cases of high collinearity, it is possible to find that one or more of the partial slope coefficients are individually statistically insignificant on the basis of t-test. But the R2 in such situations may be so high say in excess of 0.9.in such a case on the basis of F test one can convincingly reject the hypothesis that β1 = β 2 = − − − = β k = 0 Indeed, this is one of the signals of multicollinearity- insignificant t-values but a high overall R2 (i.e a significant F-value). 7. The OLS estimators and their standard errors can be sensitive to small change in the data. Dear Readers! These are the major consequences of near or high multicolinearity. If you have any comments or suggestion, you are welcome! 4.3.4 Detection of Multicollinearity A recognizable set of symptoms for the existence of multicollinearity on which one can rely are: a. High coefficient of determination ( R2) b. High correlation coefficients among the explanatory variables (rx x ' s ) i j c. Large standard errors and smaller t-ratio of the regression parameters Note: None of the symptoms by itself are a satisfactory indicator of multicollinearity. Because: i. Large standard errors may arise for various reasons and not only because of the presence so linear relationships among explanatory variables. Econometrics: Module-II Bedru B. and Seid H ii. A high rx x is only sufficient but not a necessary condition (adequate condition) i j for the existence of multicollinearity because multicollinearity can also exist even if the correlation coefficient is low. However, the combination of all these criteria should help the detection of multicollinearity. 4.3.4.1 Test Based on Auxiliary Regressions: Since multicollinearity arises because one or more of the regressors are exact or approximately linear combinations of the other regressors, one may of finding out which X variable is related to other X variables to regress each Xi on the remaining X variables and compute the corresponding R2, which we designate as Ri2 , each one of these regressions is called auxiliary to the main regression of Y on the X’s. Then, following the relationship between F and R2 established in chapter three under over all significance , the variable, Ri = R x21 , x2 , x3 ,... xk / k − 2 1 − R x21 , x2 , x3 ,... xk /( n − k + 1) ~ F( k − 2, n − k +1) where: - n is number of observation - k is number of parameters including the intercept If the computed F exceeds the critical F at the chosen level of significance, it is taken to mean that the particular Xi collinear with other X’s; if it does not exceed the critical F, we say that it is not collinear with other X’s in which case we may retain the variable in the model. If Fi is statistically significant, we will have to decide whether the particular Xi should be dropped from the model. Note that according tot Klieri’s rule of thumb, which suggest that multicollinearity may be a trouble some problem only if R2 obtained from an auxiliary regression is greater than the overall R2, that is obtained from the regression of Y on all regressors. Econometrics: Module-II Bedru B. and Seid H 4.3.4.2 The Farrar-Glauber test - They use three statistics for testing mutlicollinearity There are chi-square, F-ratio and t-ratio. This test may be outlined in three steps. A. Computation of χ2 to test orthogonalitly: two variables are called orthogonal if rx x = 0 i.e. if there is no any colinearity between them. In our discussion of in i j multiple regression models, we have seen the matrix representation of a three explanatory variable model which is given by Σx12 x' x = Σx 2 x1 Σx3 x1 Σx1 x3 Σx 2 x3 Σx32 Σx1 x 2 Σx Σx3 x 2 2 2 Divide each elements of x’x by Σxi Σx j and compute the determinant Σx12 (Σx 2 ) 2 1 Σ x 2 x1 ⇒ Σx 2 Σx 2 1 2 Σx3 x1 2 2 Σx1 Σx1 1 ⇒ r12 r13 r12 1 r23 Σx1 Σx 2 Σx 22 (Σx 22 ) Σx3 x 2 2 2 Σx1 Σx1 Σx 2 x 3 Σx 22 Σx32 Σx32 (Σx32 ) 2 Σx1 x3 Σx1 x 2 2 Σx12 Σx32 r13 r23 1 The value of the determinant is equal to zero in the case of perfect multicollinearity. (since r12 , r23 = 1) On the other hand, it the case of orthogonality of the x’s, therefore rij = 0 and the value of he determinant is unity. It follows, therefore, that if the value of this determinant lies between zero and unity, there exist some degree of mutlicollinearity. For detecting the degree of multicollinearity over the whole set of explanatory variables, Glauber and Farrar suggests χ2 to test in the following way. Econometrics: Module-II Bedru B. and Seid H H 0 : x' s are orthogonal (i.e. rxi x j = 0) H 1 : x' s are not orthogonal (i.e. rxi x j = 1) Glauber and Farrar have found that the quantity χ2 = -[ n –1 – 1/6(2k+5)] . log e {value of the standardized determinant}; has a χ2 distribution with 1/2k(k-1) df. If the computed χ2 is greater than the critical value of χ2, reject H0 in favour of multicollinearty. But if it is less, then accept H0. B. Computation of t-ratio to test the pattern of multicollinearity The t-test helps to detect those variables which are the cause of multicollinearity. This test is performed based on the partial correlation coefficients through the following procedure of hypothesis. H 0 : rxi x j . x1 , x2 , x3 ,...xk = 0 H1 : rxi x j . x1 , x2 , x3 ,...xk ≠ 0 In the three variable model rx21x2 . x3 = (r12 − r13 r23 ) 2 (How?) (1 − r232 )(1 − r232 ) 2 2 x1 x3 . x3 r (r − r r ) = 13 2 12 23 2 (1 − r23 )(1 − r12 ) rx22 x3 . x1 = t* = (r23 − r12 r13 ) 2 (1 − r132 )(1 − r122 ) (r 2 xi x j − x1 , x2 , x3 ,...xk ) n − k (How?) (1 − r 2 xi x j . x1 , x2 , x3 ,...xk ) if t*>t (tabulate), H0 is rejected t*<t (tabulated), H0 is accepted, we accept Xi and Xj are not the cause of muticollinearity since ( rx x is not significant) i j Econometrics: Module-II Bedru B. and Seid H 4.3.4.3 Test of multicollinearity using Eigen values and condition index: Using Eigen values we can drive a number called condition number K as follows: k = max imum eigen value Minimum eigen value In addition using these value we can drive the condition index (CI) defined as CI = Decision rule: Max.eigen value = k min . eigen value if K is between 100 and 1000 there is moderate to strong muticollinearity and if it exceeds 1000 there is sever muticollinearity. Alternatively if CI( = k ) is between 10 and 30, there is moderate to strong multicollineaity and if it exceeds 30 there is sever muticollinearity. Example . If k=123,864 and CI=352 – This suggest existence of multicollinearity 4.3.4.4 Test of multicollinearity using Tolerance and variance inflation factor σ2 ˆ var(β 1 ) = 2 Σx1 1 σ2 = 2 VIF 2 1 − Σx i R i where Ri2 is the R 2 in the auxiliary regression of Xj on the remaining (k-2) regressors and VIF is variance inflation factor. Some authors therefore use the VIF as an indicator of multicollinearity: The larger is the value of VIFj, the more “trouble some” or collinear is the variable Xj. However, how high should VIF be before a regressor becomes troublesome? As a rule of thumb, if VIF of a variable exceeds 10 (this will happens if Ri2 exceeds (0.9) the variable is said to be highly collinear. Other authors use the measure of tolerance to detect multicollinearity. It is defined as TOLi = (1 − R 2j ) = 1 VIF Clearly, TOLj =1 if Xj is not correlated with the other regressors, where as it is zero if it is perfectly related to other regressors. Econometrics: Module-II Bedru B. and Seid H VIF (or tolerance) as a measure of collinearity, is not free of criticism. As we have σ2 ˆ seen earlier var(β ), = = 2 (VIF ) ; depends on three factors σ 2 , Σxi2 and VIF . A Σx i high VIF can be counter balanced by low σ 2 or high Σxi2 . To put differently, a high VIF is neither necessary nor sufficient to get high variances and high standard errors. Therefore, high multicollinearity, as measured by a high VIF may not necessary cause high standard errors. 4.3.5.Remedial measures It is more difficult to deal with models indicating the existence of multicollinearity than detecting the problem of multicollinearity. Different remedial measures have been suggested by econometricians; depending on the severity of the problem, availability of other sources of data and the importance of the variables, which are found to be multicollinear in the model. Some suggest that minor degree of multicollinearity can be tolerated although one should be a bit careful while interpreting the model under such conditions. Others suggest removing the variables that show multicollinearity if it is not important in the model. But, by doing so, the desired characteristics of the model may then get affected. However, following corrective procedures have been suggested if the problem of multicollinearity is found to be serious. 1. Increase the size of the sample: it is suggested that multicollinearity may be avoided or reduced if the size of the sample is increased. With increase in the size of the sample, the covariances are inversely related to the sample size. But we should remember that this will be true when intercorrelation happens to exist only in the sample but not in the population of the variables. If the variables are collinear in the population, the procedure of increasing the size of the sample will not help to reduce multicollinearity. 2. Introduce additional equation in the model: The problem of mutlicollinearity may be overcome by expressing explicitly the relationship between multicollinear Econometrics: Module-II Bedru B. and Seid H variables. Such relation in a form of an equation may then be added to the original model. The addition of new equation transforms our single equation (original) model to simultaneous equation model. The reduced form method (which is usually applied for estimating simultaneous equation models) can then be applied to avoid multicollinearity. 3. Use extraneous information – Extraneous information is the information obtained from any other source outside the sample which is being used for the estimation. Extraneous information may be available from economic theory or from some empirical studies already conducted in the field in which we are interested. Three methods, through which extraneous information is utilized in order to deal with the problem of multicollinearity. a. Method of using prior information: Suppose that the correct specification of the model is Y = α + β 1 X 1 + β 2 X 2 + U i , and also X 1 and X 2 are found to be collinear. If it is possible to gather information on the exact value of β1 or β 2 from extraneous source, we then make use of such information in estimating the influence of the remaining variable of the model in the following way. Suppose β 2* known a priori, then: Y − β 2* X 2 = α + β 1 X 1 + U Applying OLS method: βˆ1 = Σx1 ( y − βˆ 2* x 2 ) Σx1 y − βˆ 2* Σx1 x 2 = Σx12 Σx12 i.e. β̂1 is the OLS estimator of the slope of the regression of (Y − β 2* X 2 ) on X 1 . Thus, the estimating procedure described is equivalent to correcting the dependent variable for the influence of those explanatory variables with known coefficients (from extraneous source of information) and regressing this residual on the remaining explanatory variables. Econometrics: Module-II Bedru B. and Seid H b. Methods of transforming variables: This method is used when the relationship between certain parameters is known as a priori. For instance, suppose that we want to estimate the production function expressed in the form Q = ALα K β e u where Q is quantity produced L-labor input and K- the input of capital. It is required to estimate α and β . On logarithmic transformation, the function becomes: ln Q = ln A + α ln L + β ln K + u i Q* = A * +αL * + βK * +U The asterisk indicates logs of the variables. Suppose, it is observed that K and L move together so closely that it is difficult to separate the effect of changing quantities of labor inputs on output from the effect of variation in the use of capital. Again, let us assume that on the basis of information from some other source, we have a solid evidence that the present industry is characterized by constant returns to scale. This implies that α + β = 1 , we can therefore, on the basis of this information, substitute β = (1 − α ) in the transformed function. On combining the results, the relationship becomes: Dˆ t = α + βˆPt + β 2*Yt + uˆ t Where β̂1 is derived from the time series data, β̂ 2* is obtained by using the crosssection data. By following the pooling technique, we have skirted the multicollinearity between income and price. The methods described above are no sure methods to get rid of the problem of multicollinearity. Which of these rules work in practice will depend on the nature of the data under investigation and severity of the multicollinearity problem. Econometrics: Module-II Bedru B. and Seid H Chapter Five Regression on Dummy Variables 5.1 The nature of dummy variables In regression analysis the dependent variable is frequently influenced not only by variables that can be readily quantified on some well-defined scale (e.g., income, output, prices, costs, height, and temperature), but also by variables that are essentially qualitative in nature (e.g., sex, race, color, religion, nationality, wars, earthquakes, strikes, political upheavals, and changes in government economic policy). For example, holding all other factors constant, female college professors are found to earn less than their male counterparts, and nonwhites are found to earn less than whites. This pattern may result from sex or racial discrimination, but whatever the reason, qualitative variables such as sex and race do influence the dependent variable and clearly should be included among the explanatory variables. Since such qualitative variables usually indicate the presence or absence of a “quality” or an attribute, such as male or female, black or white, or Christian or Muslim, one method of “quantifying” such attributes is by constructing artificial variables that take on values of 1 or 0, 0 indicating the absence of an attribute and 1 indicating the presence (or possession) of that attribute. For example, 1 may indicate that a person is a male, and 0 may designate a female; or 1 may indicate that a person is a college graduate, and 0 that he is not, and so on. Variables that assume such 0 and 1 values are called dummy variables. Alternative names are indicator variables, binary variables, categorical variables, and dichotomous variables. Dummy variables can be used in regression models just as easily as quantitative variables. As a matter of fact, a regression model may contain explanatory variables that are exclusively dummy, or qualitative, in nature. Econometrics: Module-II Bedru B. and Seid H Example: Yi = α + β Di + u i ------------------------------------------(5.01) where Y=annual salary of a college professor Di = 1 if male college professor = 0 otherwise (i.e., female professor) Note that (5.01) is like the two variable regression models encountered previously except that instead of a quantitative X variable we have a dummy variable D (hereafter, we shall designate all dummy variables by the letter D). Model (5.01) may enable us to find out whether sex makes any difference in a college professor’s salary, assuming, of course, that all other variables such as age, degree attained, and years of experience are held constant. Assuming that the disturbance satisfy the usually assumptions of the classical linear regression model, we obtain from (5.01). Mean salary of female college professor: E (Yi / Di = 0) = α -------(5.02) Mean salary of male college professor: E (Yi / Di = 1) = α + β that is, the intercept term α gives the mean salary of female college professors and the slope coefficient β tells by how much the mean salary of a male college professor differs from the mean salary of his female counterpart, α + β reflecting the mean salary of the male college professor. A test of the null hypothesis that there is no sex discrimination ( H 0 : β = 0) can be easily made by running regression (5.01) in the usual manner and finding out whether on the basis of the t test the estimated β is statistically significant. 5.2 Regression on one quantitative variable and one qualitative variable with two classes, or categories Consider the model: Yi = α i + α 2 Di + β X i + u i ----------------------------(5.03) Econometrics: Module-II Bedru B. and Seid H Where: Yi = annual salary of a college professor X i = years of teaching experience Di = 1 if male =0 otherwise Model (5.03) contains one quantitative variable (years of teaching experience) and one qualitative variable (sex) that has two classes (or levels, classifications, or categories), namely, male and female. What is the meaning of this equation? Assuming, as usual, that E (u i ) = 0, we see that Mean salary of female college professor: E (Yi / X i , Di = 0) = α 1 + β X i ---------(5.04) Mean salary of male college professor: E (Yi / X i , Di = 1) = (α + α 2 ) + β X i ------(5.05) Geometrically, we have the situation shown in fig. 5.1 (for illustration, it is assumed that α 1 > 0 ). In words, model 5.01 postulates that the male and female college professors’ salary functions in relation to the years of teaching experience have the same slope (β ) but different intercepts. In other words, it is assumed that the level of the male professor’s mean salary is different from that of the female professor’s mean salary (by α 2 ) but the rate of change in the mean annual salary by years of experience is the same for both sexes. Econometrics: Module-II Bedru B. and Seid H If the assumption of common slopes is valid, a test of the hypothesis that the two regressions (5.04) and (5.05) have the same intercept (i.e., there is no sex discrimination) can be made easily by running the regression (5.03) and noting the statistical significance of the estimated α 2 on the basis of the traditional t test. If the t test shows that α̂ 2 is statistically significant, we reject the null hypothesis that the male and female college professors’ levels of mean annual salary are the same. Before proceeding further, note the following features of the dummy variable regression model considered previously. 1. To distinguish the two categories, male and female, we have introduced only one dummy variable Di . For if Di = 1 always denotes a male, when Di = 0 we know that it is a female since there are only two possible outcomes. Hence, one dummy variable suffices to distinguish two categories. The general rule is this: If a qualitative variable has ‘m’ categories, introduce only ‘m-1’ dummy variables. In our example, sex has two categories, and hence we introduced only a single dummy variable. If this rule is not followed, we shall fall into what might be called the dummy variable trap, that is, the situation of perfect multicollinearity. 2. The assignment of 1 and 0 values to two categories, such as male and female, is arbitrary in the sense that in our example we could have assigned D=1 for female and D=0 for male. 3. The group, category, or classification that is assigned the value of 0 is often referred to as the base, benchmark, control, comparison, reference, or omitted category. It is the base in the sense that comparisons are made with that category. Econometrics: Module-II Bedru B. and Seid H 4. The coefficient α 2 attached to the dummy variable D can be called the differential intercept coefficient because it tells by how much the value of the intercept term of the category that receives the value of 1 differs from the intercept coefficient of the base category. 5.3 Regression on one quantitative variable and one qualitative variable with more than two classes Suppose that, on the basis of the cross-sectional data, we want to regress the annual expenditure on health care by an individual on the income and education of the individual. Since the variable education is qualitative in nature, suppose we consider three mutually exclusive levels of education: less than high school, high school, and college. Now, unlike the previous case, we have more than two categories of the qualitative variable education. Therefore, following the rule that the number of dummies be one less than the number of categories of the variable, we should introduce two dummies to take care of the three levels of education. Assuming that the three educational groups have a common slope but different intercepts in the regression of annual expenditure on health care on annual income, we can use the following model: Yi = α 1 + α 2 D2i + α 3 D3i + βX i + u i --------------------------(5.06) Where Yi = annual expenditure on health care X i = annual expenditure D2 = 1 if high school education = 0 otherwise D3 = 1 if college education = 0 otherwise Note that in the preceding assignment of the dummy variables we are arbitrarily treating the “less than high school education” category as the base category. Therefore, the intercept α 1 will reflect the intercept for this category. Econometrics: Module-II The Bedru B. and Seid H differential intercepts α 2 and α 3 tell by how much the intercepts of the other two categories differ from the intercept of the base category, which can be readily checked as follows: Assuming E (u i ) = 0 , we obtain from (5.06) E (Yi | D2 = 0, D3 = 0, X i ) = α 1 + β X i E (Yi | D2 = 1, D3 = 0, X i ) = (α 1 + α 2 ) + β X i E (Yi | D2 = 0, D3 = 1, X i ) = (α 1 + α 3 ) + β X i which are, respectively the mean health care expenditure functions for the three levels of education, namely, less than high school, high school, and college. Geometrically, the situation is shown in fig 5.2 (for illustrative purposes it is assumed that α 3 > α 2 ). 5.4 Regression on one quantitative variable and two qualitative variables The technique of dummy variable can be easily extended to handle more than one qualitative variable. Let us revert to the college professors’ salary regression (5.03), but now assume that in addition to years of teaching experience and sex the skin color of the teacher is also an important determinant of salary. For simplicity, assume that color has two categories: black and white. We can now write (5.03) as : Yi = α 1 + α 2 D2i + α 3 D3i + βX i + u i -------------------------------------------(5.07) Econometrics: Module-II Bedru B. and Seid H Where Yi = annual salary X i = years of teaching experience D2 = 1 if female =0 otherwise D3 = 1 if white =0 otherwise Notice that each of the two qualitative variables, sex and color, has two categories and hence needs one dummy variable for each. Note also that the omitted, or base, category now is “black female professor.” Assuming E (u i ) = 0 , we can obtain the following regression from (5.07) Mean salary for black female professor: E (Yi | D2 = 0, D3 = 0, X i ) = α 1 + β X i Mean salary for black male professor: E (Yi | D2 = 1, D3 = 0, X i ) = (α 1 + α 2 ) + β X i Mean salary for white female professor: E (Yi | D2 = 0, D3 = 1, X i ) = (α 1 + α 3 ) + β X i Mean salary for white male professor: E (Yi | D2 = 1, D3 = 1, X i ) = (α 1 + α 2 + α 3 ) + β X i Once again, it is assumed that the preceding regressions differ only in the intercept coefficient but not in the slope coefficient β . An OLS estimation of (5.06) will enable us to test a variety of hypotheses. Thus, if α 3 is statistically significant, it will mean that color does affect a professor’s salary. Similarly, if α 2 is statistically significant, it will mean that sex also affects a professor’s salary. If both these differential intercepts are statistically significant, it would mean sex as well as color is an important determinant of professors’ salaries. Econometrics: Module-II Bedru B. and Seid H From the preceding discussion it follows that we can extend our model to include more than one quantitative variable and more than two qualitative variables. The only precaution to be taken is that the number of dummies for each qualitative variable should be one less than the number of categories of that variable. 5.5 Testing for structural stability of regression models Until now, in the models considered in this chapter we assumed that the qualitative variables affect the intercept but not the slope coefficient of the various subgroup regressions. But what if the slopes are also different? If the slopes are in fact different, testing for differences in the intercepts may be of little practical significance. Therefore, we need to develop a general methodology to find out whether two (or more) regressions are different, where the difference may be in the intercepts or the slopes or both. 5.6 Interaction effects Consider the following model: Yi = α 1 + α 2 D2i + α 3 D3i + βX i + u i ---------------------------------(5.08) where Yi = annual expenditure on clothing X i = income D2 = 1 if female = 0 if male D3 = 1 if college graduate = 0 otherwise Implicit in this model is the assumption that the differential effect of the sex dummy D2 is constant across the two levels of education and the differential effect of the education dummy D3 is also constant across the two sexes. That is, if, say, the mean expenditure on clothing is higher for females than males this is so whether they are college graduates or not. Likewise, if, say, college graduates on Econometrics: Module-II Bedru B. and Seid H the average spend more on clothing than non college graduates, this is so whether they are female or males. In many applications such an assumption may be untenable. A female college graduate may spend more on clothing than a male graduate. In other words, there may be interaction between the two qualitative variables D2 and D3 and therefore their effect on mean Y may not be simply additive as in (5.08) but multiplicative as well, as in the following model: Yi = α 1 + α 2 D2i + α 3 D3i + α 4 ( D2i D3i ) + β X i + u i -----------------(4.09) From (4.09) we obtain E (Yi | D2 = 1, D3 = 1, X i ) = (α 1 + α 2 + α 3 + α 4 ) + β X i ------------(4.10) which is the mean clothing expenditure of graduate females. Notice that α 2 = differential effect of being a female α 3 = differential effect of being a college graduate α 4 = differential effect of being a female graduate which shows that the mean clothing expenditure of graduate females is different (by α 4 ) from the mean clothing expenditure of females or college graduates. If α 2 , α 3 , and α 4 are all positive, the average clothing expenditure of females is higher (than the base category, which here is male nongraduate), but it is much more so if the females also happen to be graduates. Similarly, the average expenditure on clothing by a college graduate tends to be higher than the base category but much more so if the graduate happens to be a female. This shows how the interaction dummy modifies the effect of the two attributes considered individually. Whether the coefficient of the interaction dummy is statistically significant can be tested by the usual t test. If it turns out to be significant, the simultaneous presence of the two attributes will attenuate or reinforce the individual effects of these attributes. Needless to say, omitting a significant interaction term incorrectly will lead to a specification bias. Econometrics: Module-II Bedru B. and Seid H 5.7 The use of dummy variables in seasonal analysis Many economic time series based on monthly or quarterly data exhibit seasonal patterns (regular oscillatory movement). Examples are sales of department stores at Christmastime, demand for money (cash balances) by households at holiday times, demand for ice cream and soft drinks during the summer, and prices of crops right after the harvesting season. Often it is desirable to remove the seasonal factor, or component, from a time series so that one may concentrate on the other components, such as the trend. The process of removing the seasonal component from a time series is known as deseasonalization, or seasonal adjustment, and the time series thus obtained is called the deseasonalized or seasonally adjusted, time series. Important economic time series, such as the consumer price index, the wholesale price index, the index of industrial production, are usually published in the seasonably adjusted form. 5. 8 Piecewise linear regression To illustrate yet another use of dummy variables, consider fig 5.3, which shows how a hypothetical company remunerates its sales representatives. Econometrics: Module-II Bedru B. and Seid H It pays commissions based on sales in such manner that up to a certain level, the target, or threshold, level X*, there is one (stochastic) commission structure and beyond that level another. (Note: Besides sales, other factors affect sales commission. Assume that these other factors are represented by the stochastic disturbance term.) More specifically, it is assumed that sales commission increases linearly with sales until the threshold level X*, after which also it increases linearly with sales but at a much steeper rate. Thus, we have a piece-wise linear regression consisting of two linear pieces or segments, which are labeled I and II in fig. 5.3, and the commission function changes its slope at the threshold value. Given the data on commission, sales, and the value of the threshold level X*, the technique of dummy variables can be used to estimate the (differing) slopes of the two segments of the piecewise linear regression shown in fig. 5.3. We proceed as follows: Yi = α 1 + β X + β 2 ( X i − X *) Di + u i ------------------------------------(5.11) where Yi = sales commission X i = volume of sales generated by the sales person X*= threshold value of sales also known as a knot (Known in advance) D=1 if X i > X * = 0 if X i < X * Assuming E (u i ) = 0, we see at once that E (Yi | Di = 0, X i , X *) = α 1 + β 1 X i ---------------------------------------(5.12) which gives the mean sales commission up to the target level X* and E (Yi | Di = 1, X i , X *) = α 1 − β 2 X * + ( β 1 + β 2 ) X i ----------------------(5.13) which gives the mean sales commission beyond the target level X*. Thus, β1 gives the slope of the regression lien in segment I, and β1 + β 2 gives the slope of the regression line in segment II of the piecewise linear regression shown Econometrics: Module-II Bedru B. and Seid H in fig 5.3. A test of the hypothesis that there is no break in the regression at the threshold value X* can be conducted easily by noting the statistical significance of the estimated differential slope coefficient β̂ 2 . Summary: 1. Dummy variables taking values of 1 and 0 (r their linear transforms) are a means of introducing qualitative regressors in regression analysis. 2. Dummy variables are a data-classifying device in that they divide a sample into various subgroups based on qualities or attributes (sex, marital status, race, religion, etc.) and implicitly allow one to run individual regressions for each subgroup. If there are differences in the response of the regress and to the variation in the quantitative variables in the various subgroups, they will be reflected in the differences in the intercepts or slope coefficients, or both, of the various subgroup regressions. 3. Although a versatile took, the dummy variable technique needs to be handled carefully. First, if the regression contains a constant term, the number of dummy variables must be less than the number of classifications of each qualitative variable. Second, the coefficient attached to the dummy variables must always be interpreted in relation to the base, or reference, group, that is, the group that gets the value of zero. Finally, if a model has several qualitative variables with several classes, introduction of dummy variables can consume a large number of degrees of freedom. Therefore, one should always weigh the number of dummy variables to be introduced against the total number of observations available for analysis. 4. Among its various applications, this chapter considered but a few. These included (1) comparing two (or more) regressions, (2) deseasonalizing time series data, (3) combining time series and cross-sectional data, and(4) piecewise linear regression models. Econometrics: Module-II Bedru B. and Seid H 5. Since the dummy variables are non stochastic, they pose no special problems in the application of OLS. However, care must be exercised in transforming data involving dummy variables. In particular, the problems of autocorrelation and heteroscedasticity need to be handled very carefully. Test your-self question In studying the effect of a number of qualitative attributes on the prices charged for movie admissions in a large metropolitan area for the period 19611964, R. D.Lampson obtained the following regression for the year 1961: Yˆ = 4.13 + 5.77 D1 + 8.12 D2 − 7.68 D3 − 1.13D4 + 27.09 D5 + 31.46 log X 1 + 0.81X 2 + 3other dummy var iables (2.04) (2.67) (2.51) (1.78) (3.58) (13.78) (0.17) R 2 = 0.961 where : D1 = theater location: 1 if suburban, 0 if city center D2 = theater age: 1 if less than 10 years since construction or major renovation, 0 otherwise. D3 = type of theater: 1 if outdoor, 0 if indoor D4 = parking: 1 if provided, 0 otherwise D5 = Screening policy: 1 if first run, 0 otherwise X 1 = average percentage unused seating capacity per showing X 2 = Average film rental, cents per ticket charged by the distributor Y = adult evening admission price, cents and where the figures in parentheses are standard errors. a. Comment on the results. b. How would you rationalize the introduction of the variable X 1 ? c. How would you explain the negative value of the coefficient of D4 ? Econometrics: Module-II Bedru B. and Seid H Chapter Six Dynamic econometric models 6.1 Introduction While considering the standard regression model, we did not pay attention to the timing of the explanatory variable(s) on the dependent variable. The standard linear regression implies that change in one of the explanatory variables causes a change in the dependent variable during the same time period and during that period alone. But in economics, such specification is scarcely found. In economic phenomenon, generally, a cause often produces its effect only after a lapse of time; this lapse of time (between cause and its effect) is called a lag. Therefore, realistic formulations of economic relations often require the insertion of lapped values of the explanatory or insertion of lagged dependent variables. 6.2 Autoregressive and distributed lag models In regression analysis involving time series data, if the regression model includes not only the current but also the lagged (past) values of the explanatory variables (the X’s), it is called distributed large models. For example: C t = α + β 0Yt + β 1Yt − 2 + β 2Yt − 2 + U t is a distributed lag model of consumption function. This means that the value of the consumption expenditure (C t ) at any given time depends on the current and past values of the disposable income (Yt ) . The general form of a distributed lag model (with only lagged exogenous variables) is written as: Yt = α + β 0 X t + β 1 X t −1 + β 2 xt − 2 + − − − − + β s X t − s + − − +U t The number of lags, s, may be either finite or infinite. But generally it is assumed to be finite. The coefficient β 0 is known as the short run, or impact, multiplier because it gives the change in mean value of Y following a unit change in X in the Econometrics: Module-II Bedru B. and Seid H same time period t. If the change in X is maintained at the same level thereafter, then,(β0+β1)gives the change in the (mean value of) Y in the next period,(β0+β1+ β2) in the following period, and so on. These partial sums are called interim, or intermediate, multipliers. Finally, after ‘s’ periods we obtain which is known as the long run, distributed-lag multiplier provided the sum β exists. s ∑β i = β 0 + β1 + β 2 + − − − − − − + β s = β i =0 It should be noted that distributed lag model is not to be confused with autoregressive model. If the regression model includes lagged values of the explanatory variables it is called distributed lag model where as of the model includes one or more lagged values of the dependent variable among its explanatory variables, Yt = α + β X t + βX t −1 + U t it is called represents an autoregressive distributed-lag model. model Thus, whereas Yt = α + β X t + γYt −1 + Wt is an example of an autoregressive model. The Reasons for Lags: There are three major reasons why lags may occur. 1. Psychological reasons: As a result of the force of habit (inertia), people, for example, do not change their consumption habits immediately following a price decrease or an income increase perhaps because the process of change may involve some immediate disutility. Thus, those who become instant milliners by winning lotteries may not change the lifestyle to which they were accustomed for a long time because they may not know how to react to such windfall gain immediately. Of course, given reasonable time, they may learn to live with their newly acquired fortune. Also, people may not know whether a change is “permanent” or ‘transitory”. Thus, my reaction to an increase in my income will depend in whether or not the increase in permanent. If it is only a nonrecurring increase and in Econometrics: Module-II Bedru B. and Seid H succeeding periods my income returns to its previous level, I may save the entire increase, whereas someone else in my position might decide to “live it up”. 2. Technological reason: suppose, for instance, the price of capital relative to labor declines making substitution of capital for labor economically feasible. Of course, addition of capital takes time (gestation period) moreover, if the drop in price is expected to be temporarily, firms may not rush to substitute capital for labor, especially if they expect that after temporarily drop the price of capital may increase beyond its previous level. 3. Institutional reasons: These reasons also contribute to lags. For example, contractual obligations may prevent firms from switching from one source of labor or raw material to another. As another example, those who have placed funds in long term saving accounts for fixed durations such as one year, three years or seven years are essentially “locked” in even though money market conditions may be such that higher yields are available. 6.3 Estimation of distributed lag models Suppose we have the following distributed lag model in one explanatory variable. Yt = α + β 0 X t + β1 X t − 2 + β 2 xt − 2 + − − − − + β s X t − s + − − +U t ------------(6.01) In (6.01) the length of the lag, that is, how far back into the past we want to go hasn’t been defined. Such a model is called an infinite (lag) model, whereas models with specified lags are called a finite (lag) distributed-lag model. How do we estimate α and β in (6.01)? We may adopt two approaches: I. Ad Hoc estimation of distributed-lag models II. A priori restriction on β ' s by assuming that the β ' s follow some systematic pattern. Econometrics: Module-II Bedru B. and Seid H I. Ad Hoc estimation of distributed lag models Since the explanatory variable X t is assumed to be non-stochastic (or at least uncorrelated with the disturbance term Ut), X t −1 , X t − z − − and so on, are nonstochastic too. Therefore in principle, OLS can be applied to the above model (6.01). The Ad Hoc approach is undertaken as follows. First, regress Y on X t , then Yt on X t , X − 1 , then Yt on X t , and X t − 2 , this procedure will continue until the regression coefficients of the lagged variables start becoming statistically insignificant and/or the coefficient of at least one of the variables changes signs from positive to negative vise versa. Consider the following hypothetical example. Yˆt = 8.37 + 0.17 X t Yˆt = 8.27 + 0.111X t + 0.06 X t −1 Yˆt = 8.27 + 0.109 X t + 0.071X t −1 − 0.055 X t − 2 Yˆt = 8.32 + 0.108 X t + 0.063 X t −1 + 0.022 X t − 2 − 0.020 X t −3 Proponents of this approach chose the second regression as the “best” one because in the last two equation the sign of X t − z was not stable and in the last equation the sign of X t −3 was negative, which may be difficult to interpret economically. Although seemingly straight forward, ad hoc estimation suffers from many drawbacks, such as the following: a. There is no guide as to what is the maximum lag length b. As one estimates successive lags, there are fewer degrees of freedom left, making statistical inference some what shaky c. More importantly, in economic time series data, successive values (lags) tend to be highly correlated; hence multicollinearity rears its ugly head. d. The sequential search for the lengths of lags opens the researcher to the charge of data mining. Econometrics: Module-II Bedru B. and Seid H In view of the preceding problems, the ad hoc estimation procedure has very little to recommend it. Some prior or theoretical considerations are brought to bear upon the variousβ’s if we are to make headway with the estimation problem. Methods based on A priori restriction on β ' s II. II.1 The koyck approach to distributed lag models In order to reduce the number of lags in the given distributed lag model, the present model makes an assumption that the impact of explanatory variables (on the dependent variable) it the most distant past is less than what is in most recent periods. More specifically, the Koyck lag formulation assumes that the weights (impacts) are declining continuously. Assume the original model is: Y = α 0 + β 0 X t + β 1 X t −1 + β 2 xt − 2 + − − − − + β s X t − s + − − +U t U ~ N(0, σ 2 ) Ε(u i u j ) = 0 Ε(u i xi ) = 0 According to Koyck: β i = λi β 0 ⇒ β1 = λβ 0 , β 2 = λ2 β 0 λ is known as the rate of decline or decay, of the distributed lag and 1- λ is known as the speed of adjustment. But assuming non-negative values for λ , koyck rules out the β ’s from changing sign, and by assuming λ <1, lesser weight is assigned to the decline β ’s than current one. Also, the long run multiplier is a finite amount in Koyck scheme. ∞ ∑β i =0 i 1 = β0 1− λ Substituting the values of β ’s in the original model we obtain: Econometrics: Module-II Bedru B. and Seid H Y = α 0 + β 0 X t + (λβ 0 ) X t −1 + (λ2 β 0 ) xt − 2 + − − − − − + U t Lagging by one period and multiply by λ we get: λYt −1 = λα 0 + (λβ 0 ) X t −1 + (λ2 β 0 ) xt −2 + (λ3 β 0 ) xt −3 + − − − − − + λU t −1 Substituting λYt −1 from Yt we obtain Yt − λYt −1 = α (1 − λ ) + β 0 X t + (U t − λU t −1 ) Let α * = α (1 − λ ),Vt = U t − λU t −1 Yt = α * + β 0 X t + λYt −1 + Vt The above procedure of transformation is known as koyck transformation. If the koyck hypothesis concerning the lag scheme and assumptions concerning Vt are accepted, ordinary least square can be applied to obtain estimators of α , β 0 and λ . From these estimates, the estimates of the original parameters α 0 , β 1 , β 2 ,− − − β k can be easily obtained through: αˆ = α* , 1− λ β i = λi β 0 However, the following features of the koyck transformation may be taken note of: a. Our original model was a distributed lag model but he transformed model is autoregressive model because Yt −1 appears as one of the explanatory variables. Koyck transformation, therefore, also helps to convert distributed lag model into an auto regressive model. b. In the new formulation the error term Vt = U t − λU t −1 is found to be auto correlated despite the fact that the disturbance term of the original model is non-auto correlated. It can be seen as under Ε(VtVt −1 ) = Ε(U t − λU t −1 )(U t −1 − λU t −2 ) = Ε(U tU t −1 − U t λU t − 2 − λU 2 t −1 − λ2U t −1U t − 2 ) = −λΕ(U 2 t −1 ) = −λσ u2 ≠ 0 Econometrics: Module-II Bedru B. and Seid H c. The lagged variable Yt −1 is not also independent of the error term Vt i.e. Ε(U t Yt −1 ) ≠ 0 this is because Yt is directly dependent on Vt . Similarly Yt −1 on Vt −1 . But since Vt and Vt −1 are not independent, Yt −1 will obviously be related to Vt . Due to these two problems, the Koyck transformation of the distributed lag model will give rise to biased and inconsistent estimates. In addition to these estimation problem, the Koyck hypothesis is quite restrictive in the sense that it assumes that impact of past periods decline successively in a specific way. But the following are also possible. ⇒ 1. β 0 > β 1 > β 2 > β 3 − − − − 2. β 0 = β 1 = β 2 = β 3 − − − − 3. β 0 < β 1 < β 2 < β 3 − − − − II.2 Rationalization of the Koyck Model: Adaptive Expectations (AE) Model The Koyck model is ad hoc since it was obtained by a purely algebraic process; it is devoid of any theoretical underpinning. But this gap can be filled if we start from a different perspective. Suppose we postulate the following model: Yt = β 0 + β 1 X t* + u t -----------------------------------------------------(i) where: Y=demand for money (real cash balances) X*= equilibrium, optimum, expected long-run or normal rate of interest u= error term Equation (i) postulates that the demand for money is a function of expected (in the sense of anticipation) rate of interest. Since the expectational variable X* is not directly observable, let’s propose the following hypothesis about how expectations are formed: X * − X t*−1 = γ ( X t − X t*−1 ) ---------------------------------------------(ii) where γ , such that 0< γ ≤ 1, is known as the coefficient of expectation. The hypothesis is Econometrics: Module-II Bedru B. and Seid H known as the adaptive expectation, progressive expectation, or error learning hypothesis, popularized by Cagan and Friedman. What equation (ii) implies is that “economic agents will adapt their expectations in the light of past experience and that in particular they will learn from their mistakes.” More specifically, (ii) states that expectations are revised each period by a fraction γ of the gap between the current value of the variable and its previous expected value. Thus, for our model this would mean that expectations about interest rates are revised each period by a fraction γ of the discrepancy between the rate of interest observed in the current period and what its anticipated value had been in the previous period. Another way of stating this would be to write (ii) as: X t* = γX t + (1 − γ ) X t*−1 -------------------------------------------------(iii) which shows that the expected value of the rate of interest at time t is a weighted average of the actual value of the interest rate at time ‘t’ and its value expected in the previous period, with weights of ‘ γ ’ and ‘1- γ ’, respectively. If γ =1, X t* = X t , meaning that expectations are realized immediately and fully, that is, in the same time period. If, on the other hand, γ =0, X t* = X t*−1 , meaning that expectations are static, that is, “conditions prevailing today will be maintained in all subsequent periods. Expected future values then become identified with current values.” Substituting (iii) into (i), we obtain [ ] Yt = β 0 + β 1 γX t + (1 − γ ) X t*−1 + u t = β 0 + β1γX t + β 1 (1 − γ ) X t*−1 + u t ---------------------------(iv) Now, lag equation (i) by one period, multiply it by 1- γ , and subtract the product from (iv). After simple algebraic manipulations, we obtain: Yt = γβ 0 + γβ1 X t + (1 − γ )Yt −1 + u t − (1 − γ )u t −1 = γβ 0 + γβ1 X t + (1 − γ )Yt −1 + vt -----------------------------(v) where vt = u t − (1 − γ )u t −1 . Econometrics: Module-II Bedru B. and Seid H Let us note the difference between (i) and (v). In the former, β1 measures the average response of Y to a unit change in X*, the equilibrium or long-run value of X. In (v), on the other hand, γβ1 measures the average response of Y to a unit change in the actual or observed value of X. These responses will not be the same unless, of course, γ =1, that is, the current and long-run values of X are the same. In practice, we first estimate (v). Once an estimate of γ is obtained from the coefficient of lagged Y, we can easily compute β1 by simply dividing the coefficient of X t (= γβ 1 ) by γ . Note that like the Koyck model, the adaptive expectations model is autoregressive and its error term is similar to the Koyck error term. II. 3 Another Rationalization of the Koyck model: the stock adjustment, or partial adjustment model The adaptive expectation model is one way of rationalizing the Koyck model. Another rationalization is provided by Marc Nerlove in the so-called stock adjustment or partial adjustment model. To illustrate this model, consider the flexible accelerator model of economic theory, which assumes that there is equilibrium, optimal, desired, or long-run amount of capital stock needed to produce a given output under the given state of technology, rate of interest, etc. For simplicity assume that this desired level of capital Yt* is a linear function of output X as follows: Yt* = β 0 + β 1 X t + u t ------------------------------------------------(1) Since the desired level of capital is not directly observable, Nerlove postulates the following hypothesis, known as the partial adjustment, or stock adjustment, hypothesis: Yt − Yt −1 = δ (Yt* − Yt −1 ) --------------------------------------------(2) Econometrics: Module-II Bedru B. and Seid H where δ , such that 0 < δ ≤ 1 , is known as the coefficient of adjustment and where Yt − Yt −1 = actual change and (Yt* − Yt −1 ) =desired change. Since Yt − Yt −1 , the change in capital stock between two periods, it is nothing but investment, (2) can alternatively be written as: I t = δ (Yt* − Yt −1 ) ----------------------------------------------------------------(3) where I t = Investment in time period t. Equation (2) postulates that the actual change in capital stock (investment) in any given time period t is some fraction δ of the desired change for that period. If δ =1, it means that the actual stock of capital is equal to the desired stock; that is, actual stock adjusts to the desired stock instantaneously (in the same period). However, if δ =0, it means that nothing changes since actual stock at time t is the same as that observed in the previous time period. Typically, δ is expected to lie between these extremes since adjustment to the desired stock of capital is likely to be incomplete because of rigidity, inertia, contractual obligations, etc. – hence the name partial adjustment model. Note that the adjustment mechanism (2) alternatively can be written as: Yt = δYt * + (1 − δ )Yt −1 -------------------------------------------------(4) showing that the observed capital stock at time t is a weighted average of the desired capital stock at that time and the capital stock existing in the previous time period, δ and (1- δ ) being the weights. Now substitution of (1) into (4) gives: Yt = δ ( β 0 + β 1 X t + u t ) + (1 − δ )Yt −1 = δβ 0 + δβ 1 X t + (1 − δ )Yt −1 + δu t ----------------------------------(5) This model is called the partial adjustment model. Since (1) represents the long-run, or equilibrium, demand for capital stock, (5) can be called the short-run demand function for capital stock since in the short run the existing capital stock may not necessarily be equal to its long-run level. Once we Econometrics: Module-II Bedru B. and Seid H estimate the short-run function (5) and obtain the estimate of the adjustment coefficient δ (from the coefficient of Yt −1 ), we can easily derive the long-run function by simply dividing δβ 0 and δβ 1 by δ and omitting the lagged Y term, which will then give (1). The partial adjustment model resembles both the Koyck and adaptive expectation models in that it is autoregressive. But it has a much simpler disturbance term: the original disturbance term u t multiplied by a constant δ . But bear in mind that although similar in appearance, the adaptive expectation and partial adjustment models are conceptually very different. The former is based on uncertainty (about the future course of prices, interest rates, etc.), whereas the latter is due to technical or institutional rigidities, inertia, cost of change, etc. However, both of these models are theoretically much sounder than the Koyck model. The important point to keep in mind is that since Koyck, adaptive expectations, and stock adjustment models – apart from the difference in the appearance of the error term – yield the same final estimating model, one must be extremely careful in telling the reader which model the researcher is using and why. Thus, researchers must specify the theoretical underpinning of their model. II. 4 Combination of adaptive expectations and partial adjustment models Consider the following model: Yt* = β 0 + β1 X t* + u t ------------------------------------------------------ (a) where Yt* = desired stock of capital and X t* = expected level of output. Since both Yt* and X t* are not directly observable, one could use the partial adjustment mechanism for Yt* and the adaptive expectations model for X t* to arrive at the following estimating equation. Econometrics: Module-II Bedru B. and Seid H Yt = β 0δγ + β 1δγX t + [(1 − γ ) + (1 − δ )]Yt −1 − (1 − δ )(1 − γ )Yt − 2 + [δu t − δ (1 − γ )u t −1 ] = α 0 + α 1 X t + α 2Yt −1 + α 3Yt − 2 + vt ---------------------------------------------(b) where vt = δ [u t − (1 − γ )u t −1 ] . This model too is autoregressive, the only difference from the purely adaptive expectations model being that Yt −2 appears along with Yt −1 as an explanatory variable. Like Koyck and the AE models, the error term in (b) follows a moving average process. Another feature of this model is linear in the α ’s, it is nonlinear in the original parameters. A celebrated application of (a) has been Friedman’s permanent income hypothesis, which states that “permanent” or long-run consumption is a function of “permanent” or long-run income. The estimation of (b) presents the same estimation problems as the Koyck’s or the AE model in that all these models are autoregressive with similar error structures. II.4 The Almon approach to distributed lag models The Almon lag model possesses two advantages over the Koyck procedure. First, it does not violate any of the ordinary least square basic assumptions concerning the disturbance term. Second it is far more flexible than Koyck method in terms of the form of lag scheme. It is because; this method does not hypothesize any form of lag before hand. This model assumes that any pattern of lag scheme among β ' s can be described by polynomial. This idea is based on a theorem in mathematics known as Weierstrass’s theorem, which states that under general conditions a curve may be approximated by a polynomial whose degree is one more than the number of turning points in the curve. Suppose that the β ' s in a given distributed lag model are expected to decrease first, then increase and again decrease Econometrics: Module-II Bedru B. and Seid H Suppose our original model to be estimated is: Y = α 0 + β 0 X t + β 1 X t −1 + β 2 xt − 2 + − − − − + β s X t − s + − − +U t , assuming the above polynomial: β i = a0 + a1i + a 2 i 2 + a3i 3 where a0 , a1 , a 2 , and a3 are parameters to be estimated. We are, now in a position to obtain all β ' s by setting i equal to the value of the subscript of the particular coefficient. β 0 = a0 β 1 = a 0 + a1 + a 2 + a 3 β 2 = a0 + 2a1 + 4a 2 + 8a3 β 3 = a 0 + 3a1 + 9a 2 + 27 a3 β k = a0 + ka1 + k 2 a 2 + k 3 a3 Naturally therefore, what is needed to be estimated is only four parameters of the polynomial function: β i = a0 + a1i + a 2 i 2 + a3i 3 . Obtaining the values of a 0 , a1 , a 2 , and a 3 , we are able to estimate all the parameters of the original distributed lag model. Substituting values of β ' s in the original model, Yt = α 0 + α 0 X t + (a 0 + a1 + a 2 + a 3 ) X t −1 + (a 0 + 2a1 + 4a 2 + 8a3 ) xt − 2 + − − − − +U t k k k k Yt = α 0 + a 0 ∑ X t −i + a1 ∑ iX t −i + a 2 ∑ i 2 X t −i + a 3 ∑ i 3 X t −i + U i i =1 i =1 i =1 i =1 Yt = α 0 + a 0 w0 + a1 w1 + a 2 w2 + a 3 w3 + U t k k i =1 i =1 where: w0 = ∑ X t −i , w2 = ∑ iX t −i -----------------------------------------------(c) This is the final form (or transformed form) of Almon Lag model. We can now apply OLS method to estimate αˆ 0 , aˆ 0 , aˆ1 , aˆ 2 , and aˆ 3 to obtain β ' s in the original form. Note that vt remains in its original form. Econometrics: Module-II Bedru B. and Seid H Chapter Seven An Introduction to Simultaneous Equation models 7.1 Introduction In all the previous chapters discussed so far, we have been focusing exclusively with the problems and estimations of a single equation regression models. In such models, a dependent variable is expressed as a linear function of one or more explanatory variables. The cause-and-effect relationship in such models between the dependent and independent variable is unidirectional. That is, the explanatory variables are the cause and the independent variable is the effect. But there are situations where such one-way or unidirectional causation in the function is not meaningful. This occurs if, for instance, Y (dependent variable) is not only function of X’s (explanatory variables) but also all or some of the X’s are, in turn, determined by Y. There is, therefore, a two-way flow of influence between Y and (some of) the X’s which in turn makes the distinction between dependent and independent variables a little doubtful. Under such circumstances, we need to consider more than one regression equations; one for each interdependent variables to understand the multi-flow of influence among the variables. This is precisely what is done in simultaneous equation models. A system describing the joint dependence of variables is called a system of simultaneous equation or simultaneous equations model. The number of equations in such models is equal to the number of jointly dependent or endogenous variables involved in the phenomenon under analysis. Unlike the single equation models, in simultaneous equation models it is not usually possible (possible only under specific assumptions) to estimate a single equation of the model without taking into account the information provided by other equation of the system. If one applies OLS to estimate the parameters of each equation Econometrics: Module-II Bedru B. and Seid H disregarding other equations of the model, the estimates so obtained are not only biased but also inconsistent; i.e. even if the sample size increases indefinitely, the estimators do not converge to their true values. The bias arising from application of such procedure of estimation which treats each equation of the simultaneous equations model as though it were a single model is known as simultaneity bias or simultaneous equation bias. To avoid this bias we will use other methods of estimation, such as, Indirect Least Square (ILS), Two Stage Least Square (2SLS), three Stage Least Square(3SLS), Maximum Likelihood Methods and the Method of Instrumental Variable (IV). What happens to the parameters of the relationship if we estimate by applying OLS to each equation without taking into account the information provided by the other equations in the system? The application of OLS to estimate the parameters of economic relationships presupposes the classical assumptions discussed in chapter one of this course. One of the crucial assumptions of the OLS is that the explanatory variables and the disturbance term is independent i.e. the disturbance term is truly exogenous. Symbolically: E[XiUi] = 0. As a result, the linear model could be interpreted as describing the conditional expectation of the dependent variable (Y) given a set of explanatory variables. In the simultaneous equation models, such independence of explanatory variables and disturbance term is violated i.e. E[XiUi] ≠ 0. If this assumption is violated, the OLS estimator is biased and inconsistent. Simultaneity bias of OLS estimators: The two-way causation in a relationship leads to violation of the important assumption of linear regression model, i.e. one variable can be dependent variable in one of the equation but becomes also explanatory variable in the other equations of the simultaneous-equation model. In this case E[XiUi] may be different from zero. To show simultaneity bias, let’s consider the following simple simultaneous equation model. Econometrics: Module-II Bedru B. and Seid H Y = α 0 + α1 X + U --------------------------------------------------(10) X = β 0 + β 1Y + β 2 Z + V Suppose that the following assumptions hold. Ε(U ) = 0 , Ε(V ) = 0 Ε(U ) = σ , 2 2 u Ε(V 2 ) = σ u2 Ε(U iU j ) = 0 , Ε(ViV j ) = 0, also Ε(UiVi ) = 0; where X and Y are endogenous variables and Z is an exogenous variable. The reduced form of X of the above model is obtained by substituting Y in the equation of X. X = β 0 + β 1 (α 0 + α 1 X + U ) + β 2 Z + V X = β U +V β 0 + α 0 β1 β 2 Z + 1 + 1 − α 1 β1 1 − α 1 β1 1 − α 1 β1 − − − − − − − − − − − − − − − − − − − − − (11) Applying OLS to the first equation of the above structural model will result in biased estimator because cov( X iU i ) = Ε( X iU j ) ≠ 0 . Now, let’s proof whether this expression. cov( XU ) = Ε[{X − Ε( X )}{U − Ε(U )}] = Ε[{X − Ε( X )}U ] − − − − − − − − − − − − − − − − − − − − − − − − − (12) β + α 0 β1 β 2 β U +V Z + 1 = Ε 0 + 1 − α 1 β1 1 − α 1 β1 1 − α 1 β 1 β 0 + α 0 β 1 β 2 − Z U − 1 − α 1 β1 1 − α 1 β 1 Substituting the value of X in equation (11) into equation (12) U = Ε ( β 0 − α 0 β 1 + β 2 Z + β 1U + V − β 0 + β 1α 0 − β 2 Z ) 1 − α 1 β1 U = Ε ( β 1U + V ) 1 − α 1 β1 1 Ε( β 1U 2 + UV ) = α β 1 − 1 1 Econometrics: Module-II Bedru B. and Seid H β1 βσ2 Ε(U 2 ) = 1 u ≠ 0 , since E(UV) = 0 = 1 − α 1 β1 1 − α 1 β1 That is, covariance between X and U is not zero. As a consequence, if OLS is applied to each equation of the model separately the coefficients will turn out to be biased. Now, let’s examine how the non-zero co-variance of the error term and the explanatory variable will lead to biasness in OLS estimates of the parameters. If we apply OLS to the first equation of the above structural equation (10) Y = α 0 + α 1 X + U , we obtain αˆ1 = = Σxy Σx 2 = Σx(Y − Y ) Σx 2 Σx(α 0 + α 1 X + U ) Σx 2 But, we know that Σx = 0 and αˆ = α 1 + = = ΣxY Y Σx ; (since is zero) 2 Σx Σx 2 α 0 Σx Σx 2 + α1 ΣxU ΣxU + Σx 2 Σx 2 ΣxX = 1 , hence Σx 2 ΣxU − − − − − − − − − − − − − − − − − − − − − −(13) Σx 2 Taking the expected values on both sides; ΣxU Ε(αˆ ) = α 1 + Ε 2 Σx Since, we have already proved that Ε(ΣXU ) ≠ 0 ; which is the same as Ε( XU ) ≠ 0 . Consequently, when Ε( XU ) ≠ 0 ; Ε(αˆ ) ≠ α , that is α̂ 1 will be biased by the amount equivalent to Σxu . Σx 2 7.2 Definitions of Some Concepts • Endogenous and exogenous variables In simultaneous equation models variables are classified as endogenous and exogenous. The traditional definition of these terms is that endogenous variables are variables that are determined by the economic model (within the system) and exogenous variables are those determined from outside. Exogenous variables are Econometrics: Module-II Bedru B. and Seid H also called predetermined. Predetermined groups can be divided into two categories which are considered in general as exogenous variables. These are: current and lagged exogenous and lagged endogenous. For instance; X t and X t −1 depict the current and lagged exogenous variables and Yt −1 depicts lagged endogenous variable. This is on the assumption that X’s symbolize the exogenous variables and Y’s symbolize the endogenous variables. Thus, X t , X t −1 and Yt −1 are regarded as predetermined (exogenous) variables. Since the exogenous variables are predetermined, they are supposed to be independent of the error terms in the model. Consider the demand and supply functions. Q d = β 0 + β 1 P + β 2Y + U 1 − − − − − − − − − − − − − − − − − −(14) Q s = α 0 + α 1 P + α 2 R + U 2 − − − − − − − − − − − − − − − − − −(15) where : Q=quantity , Y=income, P=price, R=Rainfalls, U 1 &U 2 are error terms. Here P and Q are endogenous variables and Y and R are exogenous variables. • Structural models A structural model describes the complete structure of the relationships among the economic variables. Structural equations of the model may be expressed in terms of endogenous variables, exogenous variables and disturbances (random variables). The parameters of structural model express the direct effect of each explanatory variable on the dependent variable. Variables not appearing in any function explicitly may have an indirect effect and is taken into account by the simultaneous solution of the system. For instance, a change in consumption affects the investment indirectly and is not considered in the consumption function. The effect of consumption on investment cannot be measured directly by any structural parameter, but is measured indirectly by considering the system as a whole. Econometrics: Module-II Bedru B. and Seid H Example: The following simple Keynesian model of income determination can be considered as a structural model. C = α + β Y + U -----------------------------------------------(16) Y = C + Z ----------------------------------------------------(17) for α >0 and where: 0<β<1 C=consumption expenditure Z=non-consumption expenditure Y=national income C and Y are endogenous variables while Z is exogenous variable. • Reduced form of the model: The reduced form of a structural model is the model in which the endogenous variables are expressed a function of the predetermined variables and the error term only. Illustration: Find the reduced form of the above structural model. Since C and Y are endogenous variables and only Z is the exogenous variables, we have to express C and Y in terms of Z. To do this substitute Y=C+Z into equation (16). C = α + β (C + Z ) + U C = α + βC + βZ + U C − β C = α + βZ + U C (1 − β ) = α + β Z + U C= α β + 1− β 1− β U Z + ----------------------------------(18) 1− β Substituting again (18) into (17) we get; Y= α 1 + 1− β 1− β U Z + --------------------------------(19) 1− β Econometrics: Module-II Bedru B. and Seid H Equation (18) and (19) are called the reduced form of the structural model of the above. We can write this more formally as: Structural form equations C = α + βY + U Reduced form equations C= Y =C+Z Y= α β + 1− β 1− β U Z + 1− β α U Z + 1− β 1 + 1− β 1− β Parameters of the reduced form measure the total effect (direct and indirect) of a change in exogenous variables on the endogenous variable. For instance, in the β above reduced form equation(18), measures the total effect of a unit 1− β change in the non-consumption expenditure on consumption. This total effect is 1 1− β β , the direct effect, times ,the indirect effect. The reduced form equations can be obtained in two ways: 1) To express the endogenous variables directly as a function of the predetermined variables. 2) To solve the structural system of endogenous variables in terms of the predetermined variables, the structural parameters, and the disturbance terms. Consider the following simple model for a closed economy. Ct = a1Yt + U1 ---------------------------------------------------------(i) It = b1Yt + b2Yt-1 + U2-----------------------------------------------(ii) Yt = Ct +It + Gt-------------------------------------------------------(iii) This model has three equations in three endogenous variables (Ct , It , and Yt ) and two predetermined variables (Gt, andYt-1). Econometrics: Module-II Bedru B. and Seid H To obtain the reduced form of this model, we may use two methods (direct method and solving the structural model method). Direct Method: Express the three endogenous variables(Ct , It , and Yt ) as functions of the two predetermined variables (Gt, andYt-1) directly using π’s as the parameters of the reduced form model as follows. Ct = π11Yt-1 + π12Gt + V1 ------------------------------------(iv) It , =π21Yt-1 + π22Gt +V2 -------------------------------------(v) Yt =π31Yt-1 + π32Gt + V3 ------------------------------------(vi) Note: π11 , π12 , π21 , π22 , π31 , and π32 are reduced from parameters. By solving the structural system of endogenous variables in terms of predetermined variables, structural parameters and disturbances, the expressions for the reduced parameters can be obtained easily. For instance, the third structural equation (iii) can be expressed in reduced form as follows: Yt = b2/ (1-a1-b1)Yt-1 + 1/(1-a1-b1) Gt + (U1 +U2)/ (1-a1-b1). This equation is obtained by simply substituting structural equations (i) and (ii) in (iii). Form this expression: π31 = b2/ (1-a1-b1) π32 = b2/ (1-a1-b1) Test yourself Questions: a) Determine the reduced form equations for the structural equations (ii) and (iii). b) Indicate the expressions for π11 , π12, π21 , and π22 form (a) above. How to estimate the reduced form parameters? The estimates of the reduced from coefficients (π’s ) may be obtained in two ways. 1) Direct estimation of the reduced coefficients by applying OLS. 2) Indirect estimation of the reduced form coefficients: Steps: i) Solve the system of endogenous variables so that each equation contains only predetermined explanatory variables. In this way we may obtain Econometrics: Module-II Bedru B. and Seid H the system of parameters’ relations (relations between π’s and structural parameters) ii) Obtain the estimates of the structural parameters by any appropriate econometric method. iii) Substitute the estimates of the structural coefficients into the system of parameters’ relations to find the estimates of the reduced coefficients,. • Recursive models A model is called recursive if its structural equations can be ordered in such a way that the first equation includes only the predetermined variables in the right hand side; the second equation contains predetermined variables and the first endogenous variable (of the first equation) in the right hand side and so on. The special feature of recursive model is that its equations may be estimated, one at a time, by OLS without simultaneous equations bias. OLS is not applicable if there is interdependence between the explanatory variables and the error term. In the simultaneous equation models, the endogenous variables may depend on the error terms of the model; hence the OLS technique is not appropriate for estimation of an equation in a simulations equations model. However, in a special type of simultaneous equations model called Recursive, Triangular or Causal model, the use of OLS procedure of estimation is appropriate. Consider the following three equation system to understand the nature of such models: Y1 = α 10 + β 11 X 1 + β 12 X 2 + U 1 Y2 = α 20 + α 21Y1 + β 21 X 1 + β 22 X 2 + U 2 Y3 = α 30 + α 31Y1 + α 32Y2 + β 31 X 1 + β 32 X 2 + U 3 In the above illustration, as usual, the X’s and Y’s are exogenous and endogenous variables respectively. The disturbance terms follow the following assumptions. Ε(U 1U 2 ) = Ε(U 1U 3 ) = Ε(U 2U 3 ) = 0 Econometrics: Module-II Bedru B. and Seid H The above assumption is the most crucial assumption that defines the recursive model. If this does not hold, the above system is no longer recursive and OLS is also no longer valid. The first equation of the above system contains only the exogenous variables on the right hand side. Since by assumption, the exogenous variable is independent of U 1 , the first equation satisfies the critical assumption of the OLS procedure. Hence OLS can be applied straight forwardly to this equation. Consider the second equation. It contains the endogenous variable Y1 as one of the explanatory variables along with non-stochastic X’s. OLS can be applied to this equation only if it can be shown that Y1 and U 2 are independent of each other. This is true because U1, which affects Y1 is by assumption uncorrelated with U 2 , i.e. Ε(U 1U 2 ) = 0 . Y1 acts as a predetermined variable in so far as Y2 is concerned. Hence OLS can be applied to this equation. Similar argument can be stretched to the 3rd equation because Y1 and Y2 are independent of U 3 . In this way, in the recursive system OLS can be applied to each equation separately. Let us build a hypothetical recursive model for an agricultural commodity, say wheat. The production of wheat = Y1 , may be assumed to depend on exogenous factors: X 2 = climatic conditions; and X 3 =last season’s price. The retail rice = Y2 may be assumed to be the function of production level = Y1 and exogenous factor X 4 = disposable income. Finally the price obtained by the producer = Y3 can be expressed in terms of the retail price Y2 and exogenous factor X j = the cost of marketing the producer. The relevant equations of the model may be described as under: Y1 = α 1 + β 2 X 2 + β 3 X 3 + U 1 Y2 = α 4 + β 1Y1 + α 5 X 4 + U 2 Y3 = α 6 + β 2Y2 + α 7 X 5 + U 3 Econometrics: Module-II Bedru B. and Seid H In the first equation, there are only exogenous variables and are assumed to be independent of U 1 . In the second equation, the causal relation between Y1 and Y2 is in one direction. Also Y1 is independent of U 2 and can be treated just like exogenous variable. Similarly since Y2 is independent of U 3 , OLS can be applied to the third equation. Thus, we can rewrite the above equations as follows: Y1 − α 1 − α 2 X 2 − α 3 X 3 = U 1 − β1Y1 + Y2 − α 4 − α 5 X 4 = U 2 − β 2Y2 + Y3 − α 6 − α 7 X 5 = U 3 We can again rewrite this in matrix form as follows: 0 1 − β 1 1 0 −β 1 4 4 2 4 24 0 0 1 3 Coefficient matrix of endogenous var iables 0 0 Y1 − α 1 − α 2 − α 3 Y + − α 0 0 − α5 0 2 4 Y3 − α 6 0 0 0 −α 1 4 4 4 4 4 4 2 4 4 4 4 4 473 coefficient matrix of exogenous var iable X 1 U 1 X = U 2 2 X 3 U 3 X 4 X 5 The coefficient matrix of endogenous variables is thus a triangular one; hence recursive models are also called as triangular models. 7.3 Problems of simultaneous equation models Simultaneous equation models create three distinct problems. These are: 1. Mathematical completeness of the model: any model is said to be (mathematically) complete only when it possesses as many independent equations as endogenous variables. In other words if we happen to know values of disturbance terms, exogenous variables and structural parameters, then all the endogenous variables are uniquely determined. 2. Identification of each equation of the model: Many times it so happens that a given set of values of disturbance terms and exogenous variables yield the same values of different endogenous variables included in the model. It is because the equations are observationally indistinguishable, Econometrics: Module-II Bedru B. and Seid H what is needed is that the parameters of each equation in the system should be uniquely determined. Hence, certain tests are required to examine the identification of each equation before its estimation. 3. Statistical estimation of each equation of the model: Since application of OLS yield biased and inconsistent estimates, different statistical techniques are to be developed to estimate the structural parameters. Some of the most common simultaneous methods* of estimation are: i) The indirect least square method(ILS) ii) The two-stage least square method(2SLS) iii) The three-stage least square method(3SLS) iv) Limited information maximum likelihood method (LIML) v) The instrumental variable method (IV) vi) The mixed estimation method; and vii) The full information maximum likelihood method (FIML) Of the three problems, we are going to discuss the second problem (the identification problem) in the following section. 7. 4 The identification problem In simultaneous equation models, the Problem of identification is a problem of model formulation; it does not concern with the estimation of the model. The estimation of the model depends up on the empirical data and the form of the model. If the model is not in the proper statistical form, it may turn out that the parameters may not uniquely estimated even though adequate and relevant data are available. In a language of econometrics, a model is said to be identified only when it is in unique statistical form to enable us to obtain unique estimates of its parameters from the sample data. To illustrate the problem identification, let’s consider a simplified wage-price model. * These methods of estimation are not discussed in this module as they are beyond the scope of this introductory course. Econometrics: Module-II Bedru B. and Seid H W = α + βP + ϒE + U --------------------------------------(i) P = λ + µ W + V ------------------------------------------------(ii) where W and P are percentage rates of wage and price inflation respectively, E is a measure of excess demand in the labor market while U and V are disturbances, E is assumed to be exogenously determined. If E is assumed to be exogeneoulsy determined, then (i) and (ii) represent two equations determining two endogenous variables: W and P. Let’s explain the problem of identification with help of these two equations of a simultaneous equation model. Let’s use equation (ii) to express ‘W’ in terms of P: W = −λ µ + 1 µ P− V µ -------------------------------------------------(iii) Now, suppose A and B are any two constants. Let’s multiply equation (i) by A, multiply equation (ii) by B and then add the two equations. This gives ( A + B )W = Aα − B λ B B + Aβ + P + AϒE + AU − V µ µ µ or B Bλ Aβ + B AU − V µ Aϒ µ -------------------(iv) V + P+ E + A+ B A+ B A+ B A+ B Aα − W= Equation (iv) is what is known as a linear combination of (i) and (ii). The point about equation (iv) is that it is of the same statistical form as the wage equation (i). That is, it has the form: W = constant + (constant)P + (constant)E + disturbance Moreover, since A and B can take any values we like, this implies that our wage price model generates an infinite number of equations such as (iv), which are all statistically indistinguishable from the wage equation (i). Hence, if we apply OLS or any other technique to data on W, P and E in an attempt to estimate the wage equation, we can’t know whether we are actually estimating (i) rather than one of the infinite number of possibilities given by (iv). Equation (i) is said to be Econometrics: Module-II Bedru B. and Seid H unidentified, and consequently there is now no way in which unbiased or even consistent estimators of its parameters may be obtained. Notice that, in contrast, price equation (ii) cannot be confused with the linear combination (iv), because it is a relationship involving W and P only and does not, like (iv), contain the variable E. The price equation (ii) is therefore said to be identified, and in principle it is possible to obtain consistent estimates of its parameters. A function (an equation) belonging to a system of simultaneous equations is identified if it has a unique statistical form, i.e. if there is no other equation in the system, or formed by algebraic manipulations of the other equations of the system, contains the same variables as the function(equation) in question. Identification problems do not just arise only on two equation-models. Using the above procedure, we can check identification problems easily if we have two or three equations in a given simultaneous equation model. However, for ‘n’ equations simultaneous equation model, such a procedure is very cumbersome. In general for any number of equations in a given simultaneous equation, we have two conditions that need to be satisfied to say that the model is in general identified or not. In the following section we will see the formal conditions for identification. 7. 5 Formal Rules (Conditions) for Identification Identification may be established either by the examination of the specification of the structural model, or by the examination of the reduced form of the model. Traditionally identification has been approached via the reduced form. Actually the term ‘identification’ was originally used to denote the possibility (or impossibility) of deducing the values of the parameters of the structural relations from a knowledge of the reduced form parameters. In this section we will examine Econometrics: Module-II Bedru B. and Seid H both approaches. However, we think that the reduced form approach is conceptually confusing and computationally more difficult than the structural model approach, because it requires the derivation of the reduced from first and then examination of the values of the determinant formed form some of the reduced form coefficients. The structural form approach is simpler and more useful. In applying the identification rules we should either ignore the constant term, or, if we want to retain it, we must include in the set of variables a dummy variable (say X0) which would always take on the value 1. Either convention leads to the same results as far as identification is concerned. In this chapter we will ignore the constant intercept. 7.5.1 Establishing identification from the structural form of the model There are two conditions which must be fulfilled for an equation to be identified. 1. The order condition for identification This condition is based on a counting rule of the variables included and excluded from the particular equation. It is a necessary but not sufficient condition for the identification of an equation. The order condition may be stated as follows. For an equation to be identified the total number of variables (endogenous and exogenous) excluded from it must be equal to or greater than the number of endogenous variables in the model less one. Given that in a complete model the number of endogenous variables is equal to the number of equations of the model, the order condition for identification is sometimes stated in the following equivalent form. For an equation to be identified the total number of variables excluded from it but included in other equations must be at least as great as the number of equations of the system less one. Let: G = total number of equations (= total number of endogenous variables) K= number of total variables in the model (endogenous and predetermined) Econometrics: Module-II Bedru B. and Seid H M= number of variables, endogenous and exogenous, included in a particular equation. Then the order condition for identification may be symbolically expressed as: (K − M ) ≥ (G − 1) excluded var iable ≥ [total number of equatioins − 1] For example, if a system contains 10 equations with 15 variables, ten endogenous and five exogenous, an equation containing 11 variables is not identified, while another containing 5 variables is identified. a. For the first equation we have G = 10 K = 15 M = 11 Order condition: ( K − M ) ≥ (G − 1) ; that is, the order condition is not satisfied. (15 − 11) < (10 − 1) b. For the second equation we have G = 10 K = 15 M =5 order condition: ( K − M ) ≥ (G − 1) (15 − 5) < (10 − 1) ; that is, the order condition is satisfied. The order condition for identification is necessary for a relation to be identified, but it is not sufficient, that is, it may be fulfilled in any particular equation and yet the relation may not be identified. 2. The rank condition for identification The rank condition states that: in a system of G equations any particular equation is identified if and only if it is possible to construct at least one non-zero determinant of order (G-1) from the coefficients of the variables excluded from that particular equation but contained in the other equations of the model. The practical steps for tracing the identifiablity of an equation of a structural model may be outlined as follows. Econometrics: Module-II Bedru B. and Seid H Firstly. Write the parameters of all the equations of the model in a separate table, noting that the parameter of a variable excluded from an equation is equal to zero. For example let a structural model be: y1 = 3 y 2 − 2 x1 + x 2 + u1 y 2 = y 3 + x3 + u 2 y3 = y1 − y 2 − 2 x3 + u 3 where the y’s are the endogenous variables and the x’s are the predetermined variables. This model may be rewritten in the form − y1 + 3 y 2 + 0 y 3 − 2 x1 + x 2 + 0 x3 + u1 = 0 0 y1 − y 2 + y 3 + 0 x1 + 0 x 2 + x3 + u 2 = 0 y1 − y 2 − y 3 + 0 x1 + 0 x 2 − 2 x3 + u 3 = 0 Ignoring the random disturbance the table of the parameters of the model is as follows: Variables Equations Y1 Y2 Y3 X1 X2 X3 1st equation 2nd equation 3rd equation -1 0 1 3 -1 -1 0 1 -1 -2 0 0 1 0 0 0 1 -2 Secondly. Strike out the row of coefficients of the equation which is being examined for identification. For example, if we want to examine the identifiability of the second equation of the model we strike out the second row of the table of coefficients. Thirdly. Strike out the columns in which a non-zero coefficient of the equation being examined appears. By deleting the relevant row and columns we are left with the coefficients of variables not included in the particular equation, but contained in the other equations of the model. For example, if we are examining for identification the second equation of the system, we will strike out the second, third and the sixth columns of the above table, thus obtaining the following tables. Econometrics: Module-II Bedru B. and Seid H Table of structural parameters Y1 Y2 Y3 -1 1 nd 0 →2 rd 1 3 ↓ 3 -1 -1 ↓ 0 1 -1 st Table of parameters of excluded variables X1 -2 0 0 X2 X3 Y3 X1 X2 1 0 0 ↓ 0 1 -2 -1 -2 1 1 0 0 Fourthly. Form the determinant(s) of order (G-1) and examine their value. If at least one of these determinants is non-zero, the equation is identified. If all the determinants of order (G-1) are zero, the equation is underidentified. In the above example of exploration of the identifiability of the second structural equation we have three determinants of order (G-1)=3-1=2. They are: ∆1 = −1 − 2 1 0 ≠0 ∆2 = −2 1 0 0 =0 ∆3 = −1 1 1 0 ≠0 (the symbol ∆ stands for ‘determinant’) We see that we can form two non-zero determinants of order G-1=3-1=2; hence the second equation of our system is identified. Fifthly. To see whether the equation is exactly identified or overidentified we use the order condition ( K − M ) ≥ (G − 1). With this criterion, if the equality sign is satisfied, that is if ( K − M ) = (G − 1) , the equation is exactly identified. If the inequality sign holds, that is, if ( K − M ) < (G − 1) , the equation is overidentified. In the case of the second equation we have: G=3 K=6 M=3 And the counting rule ( K − M ) ≥ (G − 1) gives (6-3)>(3-1) Therefore the second equation of the model is overidentified. The identification of a function is achieved by assuming that some variables of the model have zero coefficient in this equation, that is, we assume that some variables do not directly affect the dependent variable in this equation. This, however, is an assumption which can be tested with the sample data. We will Econometrics: Module-II Bedru B. and Seid H examine some tests of identifying restrictions in a subsequent section. Some examples will illustrate the application of the two formal conditions for identification. Example 1. Assume that we have a model describing the market of an agricultural product. From the theory of partial equilibrium we know that the price in a market is determined by the forces of demand and supply. The main determinants of the demand are the price of the commodity, the prices of other commodities, incomes and tastes of consumers. Similarly, the most important determinants of he supply are the price of the commodity, other prices, technology, the prices of factors of production, and weather conditions. The equilibrium condition is that demand be equal to supply. The above theoretical information may be expressed in the form of the following mathematical model. D = a 0 + a1 P1 + a 2 P2 + a3Y + a 4 t + u D = b0 + b1 P1 + b2 P2 + b3 C + b4 t + w D=S Where: D= quantity demanded S= quantity supplied P1 = price of the given commodity P2 = price of other commodities Y= income C= costs (index of prices of factors of production) t= time trend. In the demand function it stands for ‘tastes’; in the supply function it stands for ‘technology’. The above model is mathematically complete in the sense that it contains three equations in three endogenous variables, D,S and P1. The remaining variables, Y, P2, C, t are exogenous. Suppose we want to identify the supply function. We apply the two criteria for identification: 1. Order condition: ( K − M ) ≥ (G − 1) In our example we have: Therefore, K=7 M=5 G=3 (K-M)=(G-1) or (7-5)=(3-1)=2 Econometrics: Module-II Bedru B. and Seid H Consequently the second equation satisfies the first condition for identification. 2. Rank condition The table of the coefficients of the structural model is as follows. Variables Equations st 1 equation 2nd equation 3rd equation D P1 P2 -1 0 1 a1 b1 a2 b2 0 0 Y a3 0 0 t S C a4 b4 0 -1 1 b3 0 0 0 Following the procedure explained earlier we strike out the second row an the second, third, fifth, sixth and seventh columns. Thus we are left with the table of the coefficients of excluded variables: Complete table of Structural parameters -1 0 1 a1 b1 a2 b2 Table of parameters of variables excluded from the second equation a3 0 0 a4 b4 0 1 1 0 -1 a3 b3 -1 1 0 0 0 0 From this table we can form only one non-zero determinant of order (G-1) = (3-1) =2 ∆= −1 a3 −1 0 = (0)(−1) − (−1)(a3 ) = a 3 The value of the determinant is non-zero, provided that a3 ≠ 0 . We see that both the order and rank conditions are satisfied. Hence the second equation of the model is identified. condition the equality holds: Furthermore, we see that in the order (7-5) = (3-1) = 2. Consequently the second structural equation is exactly identified. Example 2. Assume the following simple version of the Keynesian model of income determination. Consumption function: C t = a 0 + a1Yt − a 2Tt + u Investment function: I t = b0 + b1Yt −1 + u Taxation function: Tt = c0 + c1Yt + w Definition: Yt = C t + I t + Gt Econometrics: Module-II Bedru B. and Seid H This model is mathematically complete in the sense that it contains as many equations as endogenous variables. There are four endogenous variables, C,I,T,Y, and two predetermined variables, lagged income (Yt −1 ) and government expenditure (G). A. The first equation (consumption function) is not identified 1. Order condition: ( K − M ) ≥ (G − 1) There are six variables in the model (K=6) and four equations (G=4). The consumption function contains three variables (M=3). (K-M)=3 and (G-1)=3 Thus (K-M)=(G-1), which shows that the order condition for identification is satisfied. 2. Rank condition The table of structural coefficients is as follows Variables Equations C Y T I Yt −1 G 1st equation 2nd equation 3rd equation 4th equation -1 a1 a2 0 0 0 -1 0 0 -1 0 1 0 0 0 1 0 1 1 C1 -1 b1 0 0 We strike out the first row and the three first columns of the table and thus obtain the table of coefficients of excluded variables. Complete table of structural parameters Table of coefficients of excluded variables -1 a1 a 2 0 0 0 b1 0 0 -1 0 0 0 -1 b1 0 c1 -1 0 0 0 0 0 0 1 -1 0 1 1 -1 0 0 0 We evaluate the determinant of this table. Clearly the value of this determinant is zero, since the second row contains only zeros. Consequently we cannot form any Econometrics: Module-II Bedru B. and Seid H nonzero determinant of order 3(=G-1). The rank condition is violated. Hence we conclude that the consumption function is not identified, despite the satisfaction of the order criterion. B. The investment function is overidentified 1. Order condition The investment function includes two variables. Hence K-M = 6-2 Clearly (K-M) > (G-1), given that G-1=3. The order condition is fulfilled. 2. Rank condition Deleting the second row and the fourth and fifth columns of the structural coefficients table we obtain . Complete table of structural Parameters -1 0 0 1 a1 a2 0 0 -1 0 c1 -1 0 -1 0 1 0 b1 0 0 Table of coefficients of excluded variables 0 0 0 1 -1 0 -1 a1 c1 -1 a2 -1 0 0 0 1 The value of the first 3x3 determinant of the parameters of excluded variables is ∆ 1 = −1 c1 −1 −1 0 − a1 0 1 0 −1 + a2 0 1 c1 −1 = 1 + a1 − a 2 c1 ≠ 0 (provided a1 − a 2 c1 ≠ −1 ) The rank condition is satisfied since we can construct at least one non-zero determinant of order 3=(G-1). Applying the counting rule ( K − M ) ≥ (G − 1) we see that the inequality sign holds: 4>3; hence the investment function is overidentified. Self-test Question: Detect the identificability of the tax equation . Econometrics: Module-II Bedru B. and Seid H 7.5.2 Establishing identification from the reduced form Like that of the identification conditions from structural equations, there are two conditions for identification based on the reduced form of the model, an order condition and a rank condition. The order condition is the same as in the structural model. The rank condition here refers to the value of the determinant formed from some of the reduced form parameters, π‘s. 1. Order condition (necessary condition), as applied to the reduced form An equation belonging to a system of simultaneous equations is identified if (K − M ) ≥ (G − 1) Total number of number of excluded var iables ≥ equations less one where K, M and G have the same meaning as before: K= total number of variables, endogenous and exogenous, in the entire model M= number of variables, endogenous and exogenous, in any particular equation G= number of structural equations=number of all endogenous variables in the model If (K-M) = (G-1), the equation is exactly identified, provided that the rank condition set out below is also satisfied. If (K-M)>(G-1), the equation is overidentified, while if (K-M)<(G-1), the equation is underidentified, under the same proviso. 2.Rank condition as applied to the reduced form Let G* stand for the number of endogenous variables contained in a particular equation. The rank condition as applied to the reduced form may be stated as follows. An equation containing G* endogenous variables is identified if and only if it is possible to construct at least one non-zero determinant of order G*-1 from the Econometrics: Module-II Bedru B. and Seid H reduced form coefficients of the exogenous (predetermined) variables excluded from that particular equation. The practical steps involved in this method of identification may be outlined as follows. Firstly. Obtain the reduced form of structural model. For example assume that the original model is y1 = b12 y 2 + y11 x1 + y12 x 2 + u1 y 2 = b23 y 3 + y 23 x3 + u 2 y3 = b31 y1 + b32 y 2 + y 33 x3 + u 3 This model is complete in the sense that it contains three equations in three endogenous variables. The model contains altogether six variables, three endogenous ( y1 , y 2 , y 3 ) and three exogenous ( x1 , x 2 , x3 ). The reduced form of the model is obtained by solving the original equations for the exogenous variables. The reduced form in the above example is: y1 = π 11 x1 + π 12 x 2 + π 13 x3 + v1 y 2 = π 21 x1 + π 22 x 2 + π 23 x3 + v 2 y3 = π 31 x1 + π 32 x 2 + π 33 x3 + v3 where the π’s are functions of the structural parameters. Secondly. Form the complete table of the reduced form coefficients. Exogenous Variables Equations 1st equation: y1 2nd equation: y 2 3rd equation: y 3 x1 x2 x3 π 11 π 12 π 13 π 21 π 31 π 22 π 32 π 23 π 33 Strike out the rows corresponding to endogenous variables excluded from the particular equation being examined for identifiability. Also strike out all the Econometrics: Module-II Bedru B. and Seid H columns referring to exogenous variables included in the structural form of the particular equation. After these deletions we are left with the reduced form coefficients of exogenous variables excluded (absent) from the structural equation. For example, assume that we are investigating the identification procedure are found by striking out the first row (since y1 , does not appear in the second equation) and the third column (since x3 , is included in this equation). Complete table of reduced form coefficients Table of reduced form coefficients of excluded exogenous variables π 11 π 12 π 13 π 21 π 21 π 31 π 22 π 32 π 23 π 33 π 31 π 22 π 32 Thirdly. Examine the order of the determinants of the π’s of excluded exogenous variables and evaluate them. If the order of the larges non-zero determinant is G*-1, the equation is identified. Otherwise the equation is not identified. Major References Gujarati, D.N., 1995. Basic Econometrics (3rd ed.), Mc Graw Hill Koutsoyiannis, A., 1997. Theory of Econometrics (2nd ed.), Macmillan Madanani, G.M.K., 1995. Basic Econometrics (3rd ed.), Macmillan Thomas,R.L., 1997. Modern Economtetrics: An Introduction, Addison-wesley We hope you enjoyed the reading! Any comments are welcome! Econometrics: Module-II