Note

advertisement
Panel Data Notes
In class we have discussed the problem of omitted variable bias in a linear regression model. We know that
relevant variables, if omitted from the model (because collecting the data is not feasible or possible), will
cause the coefficient estimates on the variables that are included to be biased if the omitted variables are
correlated with the included variables. This correlation is more often the rule than the exception; therefore
the likelihood of omitted variable bias is quite high. We will see that one way of controlling for the problem
is to use panel data.
Suppose we have a model that pertains to panel data: the variables vary across entities and time (these
variables have the i,t subscript). But also suppose that there are some variables that do not vary across time,
they vary only across entities (the cross-section). These variables are represented by the variable Zi in the
following model:
1)
Yi ,t   o  1 X i ,t  Z i  u i ,t
It may be that it is too difficult to gather data on variable Z, but that data on variables X and Y are available.
If you were to simply run a regression to estimate the model:
2)
Yi ,t   o  1 X i ,t  u i ,t
the Least Squares estimator of 1 will be biased if the omitted variable Zi is in any way correlated with the
included variable Xi,t. This is the usual omitted variable bias.
Of course, you might think that the ideal response would be to gather data on Zi to include in the model.
This may not be feasible due to lack of data or time. Because the variable Zi does not vary over time, only
across entities, we have 2 methods that can control for this omitted variable yet do not require any additional
data gathering. Both of these methods are similar to the approach we take to correct a model for either a
heteroskedastic error or an serial correlated error: basically we transform the model in a way that eliminates
the problem. We then proceed with Least Squares on the transformed model. Like heteroskedasticity and
serial correlation, the transformation leaves the underlying parameters in tact.
A. Method of First Differences:
This method will be illustrated for the case where the number of time periods is limited to two. This
limitation is only for purposes of illustration; in most cases we have more than two time periods. The point
to recognize is that the model applies for the cross-section of entities in every time period. Suppose that you
have a cross section of 50 entities and 2 time periods: t = 1982 and 1988. Equation 1)
Yi ,t   o  1 X i ,t  Z i  u i ,t
can be written out for each of the two time periods, since we assume that the models holds in each time
period. Writing out the model for each time period gives us two equations that have the same parameters.
3)
1982: Yi ,1982   o  1 X i ,1982  Z i  u i ,1982
1
4)
1988: Yi ,1988   o  1 X i ,1988  Z i  u i ,1988
Recall that the data on Zi are not available so that each of these models is subject to an omitted variable bias.
To fix the model, we transform it by taking first differences:
(Yi ,1988  Yi ,1982 )   o  1 X i ,1988  Z i  u i ,1988  (  o  1 X i ,1982  Z i  u i ,1982 )
5)
Yi  1 X i  u i
Notice that the variable Zi drops out of the model (it is “swept away”) leaving us with an equation that does
not have an omitted variable. We can estimate this model using the method of least squares. But first, we
need to create a new dependent variable that is the change in Yi from 1982 to 1988 and a new independent
variable that is the change in Xi from 1982 to 1988. Note that the data set will collapse to one with N rows of
data. The “t” dimension collapses to 1. Each “i” will have one row of data that measures the change in Y
and the change in X from 1982 to 1988.
The transformation of the original model uses the changes in X (ΔXi) to explain the changes in Y (ΔYi).
Since the variable Zi doesn’t change over time, any changes in Y over time cannot be caused by it. It is true
that the variable Z has an influence on the level of Y, but not changes in Y. Notice also that the original
intercept o drops out. To estimate this model, we would not include an intercept. What would happen if we
included an intercept? We will discuss this variation on the first difference model in class.
Lastly, we do not have to limit ourselves to using only the difference from the first observation to the last.
Instead, we can use year to year differences as below:
(Yi ,t  Yi ,t 1 )   o  1 X i ,t  Z i  u i ,t  (  o  1 X i ,t 1  Z i  u i ,t 1 )
Yi ,t  1X i ,t  u i ,t
The data set with a cross-section of N entities, each observed for T years, will collapse to a data set with N
entities, each with a time dimension of T-1.
B. Dummy Variable Method
For this method, let’s suppose that we have only 2 entities in our cross-section, but have several time periods
for each. Again, this limitation is only for purposes of illustration; in most cases we have many entities in
our cross-section. The point to recognize is that the model applies for each cross-section’s time-series data.
These two models are:
6) i = 1: Y1,t   o  1 X 1,t  Z 1  u1,t becomes Y1,t  (  o  Z1 )  1 X 1,t  u1,t
7) i = 2: Y2,t   o  1 X 2,t  Z 2  u 2,t becomes Y2,t  (  o  Z 2 )  1 X 2,t  u 2,t
In the first-difference method above, we expressed the original model in equation 1) in two different time
periods, where each model had the same parameters. Here we express the original model from equation 1)
2
for each entity in the cross-section. That is, we can consider the model in equation 1 as a model that holds
for each entity over time.
Remember that the Zi variables do not change over time. Therefore, Z1 in equation 6 represents the value of
Z for entity 1 and this value is the same in every time period…it doesn’t vary over time. Since the variable
Z1 doesn’t vary, it becomes part of the intercept. The same is true for i = 2 in equation 7. The two entities
have the same model except for different intercepts. There is an easy way to allow for each entity to have its
own intercept but the same slope, all within the same equation: use a dummy variable.
8)
Yi ,t   o  1 X i ,t   1 Di  u i
where Di = 1 for i = 1 and Di = 0 for i = 2
In equation 8, for i=1, the intercept is o + 1 and, for i=2, the intercept is o. Thus, equation 8 achieves the
goal of a different intercept for each entity. When we have N entities in our cross-section, we would use N-1
dummy variables, giving each cross-section its own intercept:
9)
Yi ,t   o  1 X i ,t   1 D1i   2 D 2 i   3 D3i   4 D 4 i  ...  u i ,t
In this model, the intercept varies across the entities in the cross-section but does not vary over time, which is
exactly what we supposed the omitted variable did. Therefore, we are essentially using a dummy variable
for each entity to proxy for the omitted variable Zi. Remember that we assumed Zi varied only across entities,
not over time. This is exactly what the intercepts do: they are different for each entity but do not vary over
time.
C. Entity demeaned Data
Above, it was stated that there are two methods of dealing with these omitted variables that vary only across
entities in the cross-section: use first differences or use dummy variables. Here is a third method that is
mathematically equivalent to using dummy variables. Let Zi be renamed simply i which captures the idea
of a different intercept for each entity in the cross-section. The original model was
Yi ,t   o  1 X i ,t  Z i  u i ,t
which is now:
Yi ,t   o   i  1 X i ,t  u i ,t
10)
Now consider the average of this equation for each entity i:
Yi   o   i  1 X i  u i
11)
T
where Yi 
 Yi,t
t 1
T
and X i 
 X i,t
t 1
are “entity-specific” means. Basically we are taking the mean of the Y
T
T
values in the time dimension, for each entity in the cross-section. Similar to the approach that uses first
differences, we will transform the model by subtracting equation 11 from equation 10. As before, this
3
“sweeps” out the i term. Estimation can proceed since the transformed model no longer has an omitted
variable.
Yi ,t  Yi   o   i  1 X i ,t  u i ,t  (  o   i  1 X i  u i )
12)
Yi ,t  Yi  1 ( X i ,t  X i )  u i ,t  u i
The dependent and independent variables are said to be “entity-demeaned” since the transformation requires
one to subtract the entity-specific means from the original variable.
It can be shown that Method B (Dummy Variables) and Method C (Entity-Demeaned Data) are equivalent:
your coefficient estimates of 1 should be identical. Method A (first differences) will not yield identical
results as methods B and C but the two should be close. If the estimates from the two methods are not close,
it may indicate additional problems with your model.
In Practice
Recall the severe problem of a biased estimator. When we omit important variables from a model, the
estimates of the parameters on the variables we do include are not reliable. In class we have discussed the
vehicle fatality rate and beer tax example. When we estimated the model using a cross-section of data on the
48 contiguous states in the U.S., we find that beer tax has a positive effect on the vehicle fatality rate.
Theory tells us that a state with a higher beer tax will have higher prices, leading to less alcohol consumption
and thus fewer fatalities. Because we get a sign on our estimate that is the opposite of what we expected, we
suspect an omitted variable bias. When we estimate the model using a panel of state-level data (instead of
just a cross-section) using any of the methods described here, the estimated coefficient on the beer tax
variable changes sign, confirming our suspicion of an omitted variable bias.
SAS Tasks
In class received sas code showing you:

How to create first-differenced data

How to create a dummy variable for each entity in the cross-section

How to automate the creation of a dummy variable by constructing entity-demeaned data.
All of these tasks require you to learn how to manage the data set using ID variables in the cross-section
dimension and an ID variable in the time dimension.
4
Download