Panel Data Notes In class we have discussed the problem of omitted variable bias in a linear regression model. We know that relevant variables, if omitted from the model (because collecting the data is not feasible or possible), will cause the coefficient estimates on the variables that are included to be biased if the omitted variables are correlated with the included variables. This correlation is more often the rule than the exception; therefore the likelihood of omitted variable bias is quite high. We will see that one way of controlling for the problem is to use panel data. Suppose we have a model that pertains to panel data: the variables vary across entities and time (these variables have the i,t subscript). But also suppose that there are some variables that do not vary across time, they vary only across entities (the cross-section). These variables are represented by the variable Zi in the following model: 1) Yi ,t o 1 X i ,t Z i u i ,t It may be that it is too difficult to gather data on variable Z, but that data on variables X and Y are available. If you were to simply run a regression to estimate the model: 2) Yi ,t o 1 X i ,t u i ,t the Least Squares estimator of 1 will be biased if the omitted variable Zi is in any way correlated with the included variable Xi,t. This is the usual omitted variable bias. Of course, you might think that the ideal response would be to gather data on Zi to include in the model. This may not be feasible due to lack of data or time. Because the variable Zi does not vary over time, only across entities, we have 2 methods that can control for this omitted variable yet do not require any additional data gathering. Both of these methods are similar to the approach we take to correct a model for either a heteroskedastic error or an serial correlated error: basically we transform the model in a way that eliminates the problem. We then proceed with Least Squares on the transformed model. Like heteroskedasticity and serial correlation, the transformation leaves the underlying parameters in tact. A. Method of First Differences: This method will be illustrated for the case where the number of time periods is limited to two. This limitation is only for purposes of illustration; in most cases we have more than two time periods. The point to recognize is that the model applies for the cross-section of entities in every time period. Suppose that you have a cross section of 50 entities and 2 time periods: t = 1982 and 1988. Equation 1) Yi ,t o 1 X i ,t Z i u i ,t can be written out for each of the two time periods, since we assume that the models holds in each time period. Writing out the model for each time period gives us two equations that have the same parameters. 3) 1982: Yi ,1982 o 1 X i ,1982 Z i u i ,1982 1 4) 1988: Yi ,1988 o 1 X i ,1988 Z i u i ,1988 Recall that the data on Zi are not available so that each of these models is subject to an omitted variable bias. To fix the model, we transform it by taking first differences: (Yi ,1988 Yi ,1982 ) o 1 X i ,1988 Z i u i ,1988 ( o 1 X i ,1982 Z i u i ,1982 ) 5) Yi 1 X i u i Notice that the variable Zi drops out of the model (it is “swept away”) leaving us with an equation that does not have an omitted variable. We can estimate this model using the method of least squares. But first, we need to create a new dependent variable that is the change in Yi from 1982 to 1988 and a new independent variable that is the change in Xi from 1982 to 1988. Note that the data set will collapse to one with N rows of data. The “t” dimension collapses to 1. Each “i” will have one row of data that measures the change in Y and the change in X from 1982 to 1988. The transformation of the original model uses the changes in X (ΔXi) to explain the changes in Y (ΔYi). Since the variable Zi doesn’t change over time, any changes in Y over time cannot be caused by it. It is true that the variable Z has an influence on the level of Y, but not changes in Y. Notice also that the original intercept o drops out. To estimate this model, we would not include an intercept. What would happen if we included an intercept? We will discuss this variation on the first difference model in class. Lastly, we do not have to limit ourselves to using only the difference from the first observation to the last. Instead, we can use year to year differences as below: (Yi ,t Yi ,t 1 ) o 1 X i ,t Z i u i ,t ( o 1 X i ,t 1 Z i u i ,t 1 ) Yi ,t 1X i ,t u i ,t The data set with a cross-section of N entities, each observed for T years, will collapse to a data set with N entities, each with a time dimension of T-1. B. Dummy Variable Method For this method, let’s suppose that we have only 2 entities in our cross-section, but have several time periods for each. Again, this limitation is only for purposes of illustration; in most cases we have many entities in our cross-section. The point to recognize is that the model applies for each cross-section’s time-series data. These two models are: 6) i = 1: Y1,t o 1 X 1,t Z 1 u1,t becomes Y1,t ( o Z1 ) 1 X 1,t u1,t 7) i = 2: Y2,t o 1 X 2,t Z 2 u 2,t becomes Y2,t ( o Z 2 ) 1 X 2,t u 2,t In the first-difference method above, we expressed the original model in equation 1) in two different time periods, where each model had the same parameters. Here we express the original model from equation 1) 2 for each entity in the cross-section. That is, we can consider the model in equation 1 as a model that holds for each entity over time. Remember that the Zi variables do not change over time. Therefore, Z1 in equation 6 represents the value of Z for entity 1 and this value is the same in every time period…it doesn’t vary over time. Since the variable Z1 doesn’t vary, it becomes part of the intercept. The same is true for i = 2 in equation 7. The two entities have the same model except for different intercepts. There is an easy way to allow for each entity to have its own intercept but the same slope, all within the same equation: use a dummy variable. 8) Yi ,t o 1 X i ,t 1 Di u i where Di = 1 for i = 1 and Di = 0 for i = 2 In equation 8, for i=1, the intercept is o + 1 and, for i=2, the intercept is o. Thus, equation 8 achieves the goal of a different intercept for each entity. When we have N entities in our cross-section, we would use N-1 dummy variables, giving each cross-section its own intercept: 9) Yi ,t o 1 X i ,t 1 D1i 2 D 2 i 3 D3i 4 D 4 i ... u i ,t In this model, the intercept varies across the entities in the cross-section but does not vary over time, which is exactly what we supposed the omitted variable did. Therefore, we are essentially using a dummy variable for each entity to proxy for the omitted variable Zi. Remember that we assumed Zi varied only across entities, not over time. This is exactly what the intercepts do: they are different for each entity but do not vary over time. C. Entity demeaned Data Above, it was stated that there are two methods of dealing with these omitted variables that vary only across entities in the cross-section: use first differences or use dummy variables. Here is a third method that is mathematically equivalent to using dummy variables. Let Zi be renamed simply i which captures the idea of a different intercept for each entity in the cross-section. The original model was Yi ,t o 1 X i ,t Z i u i ,t which is now: Yi ,t o i 1 X i ,t u i ,t 10) Now consider the average of this equation for each entity i: Yi o i 1 X i u i 11) T where Yi Yi,t t 1 T and X i X i,t t 1 are “entity-specific” means. Basically we are taking the mean of the Y T T values in the time dimension, for each entity in the cross-section. Similar to the approach that uses first differences, we will transform the model by subtracting equation 11 from equation 10. As before, this 3 “sweeps” out the i term. Estimation can proceed since the transformed model no longer has an omitted variable. Yi ,t Yi o i 1 X i ,t u i ,t ( o i 1 X i u i ) 12) Yi ,t Yi 1 ( X i ,t X i ) u i ,t u i The dependent and independent variables are said to be “entity-demeaned” since the transformation requires one to subtract the entity-specific means from the original variable. It can be shown that Method B (Dummy Variables) and Method C (Entity-Demeaned Data) are equivalent: your coefficient estimates of 1 should be identical. Method A (first differences) will not yield identical results as methods B and C but the two should be close. If the estimates from the two methods are not close, it may indicate additional problems with your model. In Practice Recall the severe problem of a biased estimator. When we omit important variables from a model, the estimates of the parameters on the variables we do include are not reliable. In class we have discussed the vehicle fatality rate and beer tax example. When we estimated the model using a cross-section of data on the 48 contiguous states in the U.S., we find that beer tax has a positive effect on the vehicle fatality rate. Theory tells us that a state with a higher beer tax will have higher prices, leading to less alcohol consumption and thus fewer fatalities. Because we get a sign on our estimate that is the opposite of what we expected, we suspect an omitted variable bias. When we estimate the model using a panel of state-level data (instead of just a cross-section) using any of the methods described here, the estimated coefficient on the beer tax variable changes sign, confirming our suspicion of an omitted variable bias. SAS Tasks In class received sas code showing you: How to create first-differenced data How to create a dummy variable for each entity in the cross-section How to automate the creation of a dummy variable by constructing entity-demeaned data. All of these tasks require you to learn how to manage the data set using ID variables in the cross-section dimension and an ID variable in the time dimension. 4