Econ 388 R. Butler 2014 revisions lecture 20 Panel Data sets 2 I. Fixed Effects Estimators: Dummy variables in Panel Data Sets (Wooldridge, chapter 14.1) One of the most important ways we extend the use of dummy variables is to control for unobserved heterogeneity in so-called panel data sets. In terms of ferret data examples, this would be when we have repeated observations (several years) on the same individuals. For example, we have a data set of 560 persons (which we index by i), each observed for eight years (which we index by t). Then we would have 8x560 = 4480 observations, and for each we observe a yi ,t dependent variable (say wage), and a vector of observable regressors xi ,t . But in addition to the xi ,t , there is also a time invariant, person specific factor, ai which stands for work ethic, and is likely to be correlated with the xi ,t (those with better work ethics, i.e., higher values of ai are also likely to get more schooling, one of the variables in the xi ,t vector). If we estimate the effect of education (and other observable factors) on wages, not taking account of the ai , we will have an omitted variable bias in our models. The regression model can be written as 1) y i ,t x i ,t a i i ,t , where the ai are the unobserved person specific (but time invariant, hence there is no t subscript) effect and i,t is the usual well behaved error term, uncorrelated with the vector of independent variables. One way to handle the unobserved effect is to include a dummy variable for every individual in the sample (560 dummy variables, or 559 if there is an intercept in the xi ,t vector), and estimates these dummy variables along with the vector. Then our model becomes, in matrix notation, using D to denote our matrix of dummy variables: 2) Y X Da where—given that there are t time periods, n observations, and k regressors—we have the following dimensions for the relevant data matrices: Y as (nt-by-1), X(nt-by-k), and D(nt-by-n). Using the Frisch theorem (lecture 3—let the M 1 matrix in the exposition there be the M D matrix here, and X 2 there be the X here), the fixed effects FE estimator of is as follows: 3) ( X ' M D X )1 X ' M DY where MD is the orthogonal projection of the X regressors onto the space orthogonal to the D-space (that is, it is the residuals for each of the variables in X after regressing them on D). Let’s review a little MD matrix, and what it does here, and how it controls the heterogeneity effects ai by getting rid of them (it does this by differencing them out). Recall from lecture four that the space orthogonal to the vector of ones, M 1 , just takes deviations. Hence M 1 X just took all the columns of the X matrix (each representing a different variable), and deviated them from their (column) means. Let’s expand this a bit. Suppose we have a “panel” with 2 people (n=2), each observed for 4 years (t=4). Hence we have 8 observations (n* t=8). Now we will stack our data, first by individuals, then by years. Let’s exclude the constant from the regression, so we have a model with two dummy variables in it, one for each person. In this case, the D matrix will take the following form: 1 1 1 1 D 0 0 0 0 .25 0 .25 0 .25 0 .25 0 1 then PD D( D' D) D' 0 1 1 0 0 1 1 0 .25 .25 .25 0 0 0 0 .25 .25 .25 .25 .25 .25 0 0 0 0 0 0 0 0 .25 .25 .25 0 0 0 0 0 0 0 .25 .25 .25 0 0 0 0 0 0 0 0 0 .25 .25 .25 .25 .25 .25 .25 .25 .25 .25 .25 .25 .25 Hence 1 0 0 0 0 0 0 0 .25 .25 .25 .25 0 0 0 0 0 1 0 0 0 0 0 0 .25 .25 .25 .25 0 0 0 0 0 0 1 0 0 0 0 0 .25 .25 .25 .25 0 0 0 0 0 0 0 1 0 0 0 0 .25 .25 .25 .25 0 0 0 0 M D I PD 0 0 0 0 1 0 0 0 0 0 0 0 .25 .25 .25 .25 .25 .25 .25 .25 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 .25 .25 .25 .25 .25 .25 .25 .25 0 0 0 0 0 0 0 1 0 0 0 0 Suppose that our two individuals are young, and still undergoing some post high school formal educational attainment (maybe taking evening MBA courses, for example), and that the only relevant right hand side independent variable is their educational attainment, denoted as xi ,t then MDX 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 .25 .25 .25 .25 .25 .25 1 0 0 0 0 0 0 .25 .25 .25 0 1 0 0 0 0 0 0 0 1 0 0 0 0 .25 .25 .25 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 x1,1 x1,. x1,2 x1,. x x 1,3 1,. x1,4 x1,. where x1,. x2,1 x2,. x x 2,2 2,. x2,3 x2,. x2,4 x2,. takes 4 x1,t t 1 4 .25 .25 .25 0 0 0 0 0 0 0 0 0 0 0 0 .25 0 0 0 0 .25 .25 .25 .25 .25 .25 .25 .25 .25 .25 .25 .25 .25 .25 .25 .25 x1,1 x1,2 x1,3 x1,4 x 2,1 x2,2 x2,3 x 2,4 4 and x2,. x 2,t t 1 4 . That is, the M D X operation deviations of the independent variables from the mean FOR THAT INDIVIDUAL. If regress M DY on M D X , we get rid of the individual fixed effects, as you can see by summing both sides equation 1 for the ith individual and dividing through to get the means of all the variables as follows: 4) yi ,. xi ,. ai i ,. and then subtracting equation (4) from equation (1) to get 5) yi ,t yi ,. ( xi ,t xi ,. ) (ai ai ) ( i ,t i ,. ) , or, yi ,t yi ,. ( xi ,t xi ,. ) ( i ,t i ,. ) Hence, the M D operator makes it feasible to compute the fixed effects model even when there are so many dummy variables for individuals that it may be impracticable to compute all those dummy variable coefficients individually. Go back to the example starting out this section: our hypothetical data set of 560 persons (which we index by i), each observed for eight years (which we index by t). Lots of computer programs would find it difficult to numerically calculate 560 coefficients (one for each person in the sample), but all computer programs can do the M D differencing operation and then estimate the k parameters in the FE estimator: ( X ' M D X ) 1 X ' M D Y . The degrees of freedom for this procedure is (nk – (n+k)), because implicitly we are estimating n parameters when we difference each variable from the mean for that individual. (Another way to see this is as follows: when we differenced the 4 time periods in our simple example above, we ended up with only three independent deviations, since the four deviations sum to zero and if I know any of the three, I automatically know the fourth. Good FE programs will automatically give you the right degrees of freedom, but you have to check the degrees of freedom in the FE procedure that your brother-in-law wrote.) Because the FE estimator depends only on deviations from the individual means, it is sometimes called the within-groups estimator. It makes use of the fact the group means (or in our case, the group is the individual), varies from one individual to another. But because group means of the variables differ from one individual to another, we could take averages over time periods and just regress the means of the dependent variable (means for each individual) on the means of the independent variables (means for the individual’s various independent variables). Then for our sample with 560 individuals, we would have exactly 560 independent observations. This alternative estimator of the model in equation one is the between-groups estimator. Since the orthogonal projection onto the D matrix, denoted PD above, takes means of variables, the between-groups estimator for the slope regressors is 6) ( X ' PD X )1 X ' PDY where for our 2 person, 4 period, simple example above we get x x1,. .25 .25 .25 .25 0 0 0 0 1,1 .25 .25 .25 .25 0 0 0 0 x1,2 x1,. x .25 .25 .25 .25 0 0 0 0 x1,3 1,. x x .25 .25 .25 .25 0 0 0 0 1,4 = 1,. PD X 0 0 0 0 .25 .25 .25 .25 x2,1 x2,. x .25 .25 .25 .25 x2,2 0 0 0 0 2,. 0 0 0 0 .25 .25 .25 .25 x x2,. 2,3 .25 .25 .25 .25 x 0 0 0 0 x2,. 2,4 Again, there are 8 observations (nt), but obviously only 2 (n) of them are independent. Also, obviously, the between-groups estimator will not get rid of the individual fixed effect, and so will be an inconsistent estimator for the model in equation 1 where the ai, are correlated with the xi,t. Since Y can always be fully decomposed into a part explained by D and a part orthogonal to D, that is Y PDY M DY (really cool result 1 in lecture 3), the OLS estimator of the panel data model can always be written as a weighted sum of the within-groups estimator and between-groups estimator as follows: 7) ˆ ( X ' X )1 X 'Y ( X ' X )1 X '( PDY M DY ) = ( X ' X ) 1 X ' PDY ( X ' X ) 1 X ' M DY = ( X ' X ) 1 ( X ' PD X )( X ' PD X ) 1 X ' PDY ( X ' X ) 1 ( X ' M D X )( X ' M D X ) 1 X ' M DY = ( X ' X )1 X ' PD X ( X ' X )1 X ' M D X That is, the OLS estimator for the panel data set is a matrix weighted average of the between groups estimator (first term on the right hand side) and within groups estimator (the far right hand side term). With correlated unobserved heterogeneity, as in equation 1, the within groups estimator is consistent and unbiased, but the between groups estimator is not. So with correlated unobserved heterogeneity, the OLS estimator will be inconsistent. If there is no unobserved heterogeneity, or it if is uncorrelated with all the right hand side slope regressors, then both the between groups estimator and the within groups estimators are consistent, so the OLS estimator is consistent. Fixed effects are especially easy to estimate with STATA. Suppose that each individual had a unique identifier, say “idnum”, and we wanted to do a fixed effects wage regression. Here is the STATA code: xtreg wage educ exper occ1 occ2 occ3 occ4, FE i(idnum); The “xt” prefix is for longitudinal data, and there are lots of different programming options available in STATA. To do random effects models in STATA (which I am not going to discuss in this lecture), we use the following STATA code: xtreg wage educ exper occ1 occ2 occ3 occ4, RE i(idnum); SAS code (out of many possibilities) for the fixed effects model is proc glm; absorb idnum; model wage= educ exper occ1 occ2 occ3 occ4; run; (OR, another one for fixed effects: proc mixed; class idnum; model wage= educ exper occ1 occ2 occ3 occ4 idnum/ solution; run;) And one type of SAS code for random effects models is proc mixed; class idnum; model wage= educ exper occ1 occ2 occ3 occ4 / solution; random int/ subject=idnum; *random int—means make the intercept a random variable; run; *so only change the id in this last line to match your data ; II. General setup statements for panel data sets not already in good shape sort panelvar datevar; * 1st var is the individual id & 2nd var is time; tsset panelvar datevar; II. Tests for panel data sets A. Random Effects vs. Fixed Effects HAUSMAN TEST xtreg y x1, fe; estimates store fixed; *stores coefficients and covariance matrix of last regression; xtreg y x1, re; estimates store random; hausman fixed random; B. To see if time fixed effects are needed when running a FE model use the command testparm. It is a joint test to see if the dummies for all years are equal to 0, if they are then no time fixed effects are needed. testparm; _Iyear*; In Stata 11and more recent versions can use: Tesparm; i.year; Summary of basic models (FE/RE) see http://www.princeton.edu/~otorres/Panel101.pdf For more information Command Individual fixed effects Individual fixed effects xtreg xtreg y x1 x2 x3 x4 x5 x6 x7, fe ; areg areg y x1 x2 x3 x4 x5 x6 x7, absorb(individ) ; regress xi: regress y x1 x2 x3 x4 x5 x6 x7 i.individ; Individual and time fixed effects xtreg xi: xtreg y x1 x2 x3 x4 x5 x6 x7 i.year, fe ; areg xI: areg y x1 x2 x3 x4 x5 x6 x7 i.year, absorb(individ) ; regress xi: regress y x1 x2 x3 x4 x5 x6 x7 i.year i.individ; Random effects xtreg xtreg y x1 x2 x3 x4 x5 x6 x7, re ; III. Dynamically Completeness: an example using panel data Essentially, a model is “dynamically complete” if it has enough lagged values (of the independent or dependent variables, or both) so that the error exhibits no serial correlation. Dynamically complete models have a regression, then, that satisfies E ( yt | xt ) E ( yt | xt , other lagged dep and indep variables ) where xt possibly already contains some lagged variables in it, and “other lagged dependent and independent variables” means just that: other lagged variables not already contained in xt . The idea is that xt already contains all the necessary lagged variables in it, and the “other lagged dependent and independent variables” aren’t needed for the model. Since the model is E( yt | xt ) 0 1xt then it follows that E(t | xt ) E(t | xt , other lagged dep and indep variables ) = 0. In particular, since “other lagged dependent and independent variables” could be used to form prior values of the error term, then E(t | xt ) E(t | xt , t 1, xt 1, t 2 ,) = 0. We can use this last result on dynamically complete models, and the law of iterated expectations, to prove that there is no serial correlation in dynamically complete models. The law of iterated expectations is just the result (appendix B in Wooldridge) that E [ E (Y | X )] E (Y ) x y The E (Y | X ) term is just the expected value of Y given X, and so is a function of X. y The law just says that if we take the expected value of this conditional mean with respect to the distribution of X, we get the (unconditional) mean of Y. The mathematical proof for this law looks something like [ y i j j f ( y j | xi )] f ( xi ) y f (y j j i j | xi ) f ( xi ) y f (y , x ) y j j j i i j f (yj) j Wooldridge (chapter 11) applies the law of iterated expectations to E (t s | xt xs ) , where t s, and shows that E ( t s | xt xs ) =0 if E (t | xt ) E (t | xt , t 1, xt 1, t 2 ,) = 0. That is, if the model is dynamically complete, there will be no serial correlation. When would this be useful for your research? Suppose that you have a panel data set as discussed in the last section of lecture 13, but we have no fixed effects (you have no unobserved heterogeneity). Rather, assume that are many years of observations on the same individual are correlated over time. The OLS estimated coefficients will be consistent, even if there is correlation, but the standard errors computed will be inappropriate unless the model is dynamically complete (i.e., that there is enough lagged values of the independent and dependent variable, so that there is no remaining autocorrelation in the errors). Suppose, for example, it is panel of wages regressed on age, experience, race dummies, educational attainment, and occupational and year dummy variables. You think that there is autocorrelation in wages for a given individual, but the error will not be autocorrelated after you include lagged values of wages as an additional regressor in the model (that this single lagged value of wages, yi ,t 1 , is enough to make the model dynamically complete). So the basic model is yi ,t yi ,t 1 xi ,t ui ,t with i ,t i ,t 1 i ,t . No (first-order) autocorrelation would be 0. To test if this is true (and thus, the model is dynamically complete given the lagged value of y on the right hand side as a regressor, substitute from i ,t i ,t 1 i ,t into the regression specification yi ,t yi ,t 1 xi ,t ui ,t and run the augmented regression y i ,t y i ,t 1 xi ,t uˆ i ,t 1 + i,t where the residuals replace the unobserved error values, and again, where the test for dynamic completeness is a test for whether ̂ =0. If you can’t reject this hypothesis, then you can’t reject the hypothesis that the specification is dynamically complete, and you can (sort-of, in the absence of other potential problems) trust that the OLS estimates of standard errors and t-statistics are correct. The STATA code for this problem would be something like the following: **data sorted first by individual ID and then by year (time); bysort id: gen lagwage = wage[_n-1]; *need to avoid lagging between people; regress wage lagwage age male educ; predict resids, residuals; bysort id: gen lag_resids = resides[_n-1]; *need to avoid lagging between people; regress wage lagwage age male educ lag_resids; This is an example based on the wagepan.raw data (wage panel data, mentioned in chapter 14 of Wooldridge): # delimit ; * reading in a panel data set from wooldridge, wagepan.raw, 8 years of data on ; * each individual, data arranged by individual(nr) then by year(year) ; infile nr year black exper hisp hours married occ1 occ2 occ3 occ4 occ5 occ6 occ7 occ8 occ9 educ union lwage d81 d82 d83 d84 d85 d86 d87 expersq using "g:\classrm_data\wooldridge\wagepan.raw", clear; ********* need to avoid lagging between people **************************; ** so the bysort only generates it for those with the same id, namely *******; ** the same nr values. So laglwage will be missing when change individuals: ; ** Jill's last observ shouldn't be lagged into Jim's first observation ******; bysort nr: gen laglwage = lwage[_n-1]; *list nr year lwage laglwage; regress lwage laglwage black hisp educ exper expersq married union occ1 occ2 occ4 occ5 occ6 occ7 occ8 occ9 d82 d83 d84 d85 d86 ; predict resids, residuals; bysort nr: gen lag_resids = resids[_n-1]; *need to avoid lagging between people; list nr year resids lag_resids; regress lwage laglwage black hisp educ exper expersq married union occ1 occ2 occ4 occ5 occ6 occ7 occ8 occ9 d82 d83 d84 d85 d86 lag_resids; with the last regression given by . regress lwage laglwage black hisp educ exper expersq married union occ1 > occ2 occ4 occ5 occ6 occ7 occ8 occ9 > d82 d83 d84 d85 d86 lag_resids; Source | SS df MS -------------+-----------------------------Model | 443.3267 22 20.1512136 Residual | 411.201889 3247 .126640557 -------------+-----------------------------Total | 854.528589 3269 .261403667 Number of obs F( 22, 3247) Prob > F R-squared Adj R-squared Root MSE = = = = = = 3270 159.12 0.0000 0.5188 0.5155 .35587 -----------------------------------------------------------------------------lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------laglwage | .883139 .0226862 38.93 0.000 .8386583 .9276197 black | -.0289215 .0205947 -1.40 0.160 -.0693013 .0114584 hisp | .0054781 .017922 0.31 0.760 -.0296614 .0406176 educ | .002805 .0051651 0.54 0.587 -.0073221 .0129321 exper | -.0220726 .0156947 -1.41 0.160 -.0528451 .0087 expersq | .000722 .0008607 0.84 0.402 -.0009655 .0024095 married | .0176767 .0132501 1.33 0.182 -.0083026 .0436561 union | .0454948 .0155623 2.92 0.003 .0149819 .0760078 occ1 | .0239954 .0320322 0.75 0.454 -.0388099 .0868006 occ2 | .0052215 .0323473 0.16 0.872 -.0582016 .0686446 occ4 | -.0264225 .0322614 -0.82 0.413 -.0896773 .0368322 occ5 | -.0144614 .0297461 -0.49 0.627 -.0727844 .0438616 occ6 | -.0371003 .0304375 -1.22 0.223 -.0967788 .0225783 occ7 | -.061769 .0347061 -1.78 0.075 -.1298171 .0062791 occ8 | -.0914091 .0641928 -1.42 0.155 -.2172717 .0344534 occ9 | -.0410516 .0330692 -1.24 0.215 -.1058902 .0237871 d82 | -.0936457 .0333213 -2.81 0.005 -.1589786 -.0283128 d83 | -.0788313 .0289228 -2.73 0.006 -.13554 -.0221226 d84 | -.0413905 .0257387 -1.61 0.108 -.0918564 .0090753 d85 | -.0420141 .0235835 -1.78 0.075 -.0882541 .004226 d86 | -.0170329 .02216 -0.77 0.442 -.0604818 .026416 lag_resids | -.443724 .0273152 -16.24 0.000 -.4972807 -.3901672 _cons | .3878219 .1081133 3.59 0.000 .1758446 .5997991 ------------------------------------------------------------------------------ The statistically significant coefficient on the lagged residual variable suggests that our model is NOT dynamically complete, so that the OLS standard errors and t-statistics are off, and we need to explicitly adjust for the auto-correlation (or add some more lagged variables on the right hand side until it becomes dynamically complete).