Lecture 20 - BYU Department of Economics

advertisement
Econ 388 R. Butler 2014 revisions lecture 20 Panel Data sets 2
I. Fixed Effects Estimators: Dummy variables in Panel Data Sets (Wooldridge,
chapter 14.1)
One of the most important ways we extend the use of dummy variables is to control for
unobserved heterogeneity in so-called panel data sets. In terms of ferret data examples,
this would be when we have repeated observations (several years) on the same
individuals. For example, we have a data set of 560 persons (which we index by i), each
observed for eight years (which we index by t). Then we would have 8x560 = 4480
observations, and for each we observe a yi ,t dependent variable (say wage), and a vector
of observable regressors xi ,t . But in addition to the xi ,t , there is also a time invariant,
person specific factor, ai which stands for work ethic, and is likely to be correlated with
the xi ,t (those with better work ethics, i.e., higher values of ai are also likely to get more
schooling, one of the variables in the xi ,t vector). If we estimate the effect of education
(and other observable factors) on wages, not taking account of the ai , we will have an
omitted variable bias in our models. The regression model can be written as
1)
y i ,t  x i ,t   a i   i ,t ,
where the ai are the unobserved person specific (but time invariant, hence there is no t
subscript) effect and  i,t is the usual well behaved error term, uncorrelated with the
vector of independent variables. One way to handle the unobserved effect is to include a
dummy variable for every individual in the sample (560 dummy variables, or 559 if there
is an intercept in the xi ,t vector), and estimates these dummy variables along with the 
vector. Then our model becomes, in matrix notation, using D to denote our matrix of
dummy variables:
2)
Y  X  Da  
where—given that there are t time periods, n observations, and k regressors—we have the
following dimensions for the relevant data matrices: Y as (nt-by-1), X(nt-by-k), and
D(nt-by-n). Using the Frisch theorem (lecture 3—let the M 1 matrix in the exposition
there be the M D matrix here, and X 2 there be the X here), the fixed effects FE estimator
of  is as follows:
3)
  ( X ' M D X )1 X ' M DY
where MD is the orthogonal projection of the X regressors onto the space orthogonal to
the D-space (that is, it is the residuals for each of the variables in X after regressing them
on D).
Let’s review a little MD matrix, and what it does here, and how it controls the
heterogeneity effects ai by getting rid of them (it does this by differencing them out).
Recall from lecture four that the space orthogonal to the vector of ones, M 1 , just takes
deviations. Hence M 1 X just took all the columns of the X matrix (each representing a
different variable), and deviated them from their (column) means. Let’s expand this a bit.
Suppose we have a “panel” with 2 people (n=2), each observed for 4 years (t=4). Hence
we have 8 observations (n* t=8). Now we will stack our data, first by individuals, then
by years. Let’s exclude the constant from the regression, so we have a model with two
dummy variables in it, one for each person. In this case, the D matrix will take the
following form:
1
1

1

1
D
0
0

0
0

.25
0
.25
0

.25
0


.25
0
1
then PD  D( D' D) D'  
 0
1

1
 0

 0
1

1
 0
.25 .25 .25
0 0 0 0
.25 .25 .25
.25 .25 .25
0
0
0 0 0
0 0 0
.25 .25 .25 0 0 0 0
0 0 0
.25 .25 .25
0 0 0
0 0 0
0 0 0
.25 .25 .25
.25 .25 .25
.25 .25 .25






.25

.25
.25

.25
Hence
1 0 0 0 0 0 0 0 
.25 .25 .25 .25 0 0 0 0

0 1 0 0 0 0 0 0 
.25 .25 .25 .25 0 0 0 0





0 0 1 0 0 0 0 0 
.25 .25 .25 .25 0 0 0 0





0 0 0 1 0 0 0 0
.25 .25 .25 .25 0 0 0 0



M D  I  PD 

0 0 0 0 1 0 0 0 
 0 0 0 0
.25 .25 .25 .25 




.25 .25 .25 .25
0 0 0 0 0 1 0 0 
 0 0 0 0
0 0 0 0 0 0 1 0 
 0 0 0 0
.25 .25 .25 .25 




.25 .25 .25 .25 
0 0 0 0 0 0 0 1 
 0 0 0 0
Suppose that our two individuals are young, and still undergoing some post high school
formal educational attainment (maybe taking evening MBA courses, for example), and
that the only relevant right hand side independent variable is their educational attainment,
denoted as xi ,t then





MDX  






1
0

0

0
0

0
0

0
0 0 0 0 0 0 0
.25 .25 .25

.25 .25 .25
1 0 0 0 0 0 0

.25 .25 .25
0 1 0 0 0 0 0


0 0 1 0 0 0 0
.25 .25 .25
 
 0 0 0 0
0 0 0 1 0 0 0


0 0 0 0 1 0 0
 0 0 0 0
 0 0 0 0
0 0 0 0 0 1 0


0 0 0 0 0 0 1 
 0 0 0 0
 x1,1  x1,. 


 x1,2  x1,. 
x  x 
 1,3 1,. 
 x1,4  x1,. 

 where x1,. 
 x2,1  x2,. 
x  x 
 2,2 2,. 
 x2,3  x2,. 


 x2,4  x2,. 
takes
4
 x1,t
t 1
4
.25
.25
.25
0 0 0 0
0 0 0 0
0 0 0 0
.25
0 0 0 0
.25 .25 .25
.25 .25 .25
.25 .25 .25
.25 .25 .25






.25 

.25 
.25 

.25 
  x1,1 
  x1,2 
 
  x1,3 
 
  x1,4 
 x 
  2,1 
  x2,2 
 
  x2,3 
  x 
 2,4 
4
and x2,. 
x
2,t
t 1
4
. That is, the M D X operation
deviations of the independent variables from the mean FOR THAT INDIVIDUAL. If
regress M DY on M D X , we get rid of the individual fixed effects, as you can see by
summing both sides equation 1 for the ith individual and dividing through to get the
means of all the variables as follows:
4)
yi ,.  xi ,.   ai  i ,.
and then subtracting equation (4) from equation (1) to get
5) yi ,t  yi ,.  ( xi ,t  xi ,. )   (ai  ai )  ( i ,t  i ,. ) , or, yi ,t  yi ,.  ( xi ,t  xi ,. )   ( i ,t  i ,. )
Hence, the M D operator makes it feasible to compute the fixed effects model even when
there are so many dummy variables for individuals that it may be impracticable to
compute all those dummy variable coefficients individually. Go back to the example
starting out this section: our hypothetical data set of 560 persons (which we index by i),
each observed for eight years (which we index by t). Lots of computer programs would
find it difficult to numerically calculate 560 coefficients (one for each person in the
sample), but all computer programs can do the M D differencing operation and then

estimate the k parameters in the FE estimator:   ( X ' M D X ) 1 X ' M D Y . The degrees of
freedom for this procedure is (nk – (n+k)), because implicitly we are estimating n
parameters when we difference each variable from the mean for that individual. (Another
way to see this is as follows: when we differenced the 4 time periods in our simple
example above, we ended up with only three independent deviations, since the four
deviations sum to zero and if I know any of the three, I automatically know the fourth.
Good FE programs will automatically give you the right degrees of freedom, but you
have to check the degrees of freedom in the FE procedure that your brother-in-law
wrote.)
Because the FE estimator depends only on deviations from the individual means,
it is sometimes called the within-groups estimator. It makes use of the fact the group
means (or in our case, the group is the individual), varies from one individual to another.
But because group means of the variables differ from one individual to another, we could
take averages over time periods and just regress the means of the dependent variable
(means for each individual) on the means of the independent variables (means for the
individual’s various independent variables). Then for our sample with 560 individuals,
we would have exactly 560 independent observations. This alternative estimator of the
model in equation one is the between-groups estimator. Since the orthogonal projection
onto the D matrix, denoted PD above, takes means of variables, the between-groups
estimator for the slope regressors is
6)
  ( X ' PD X )1 X ' PDY where for our 2 person, 4 period, simple example above
we get
x
 x1,. 
.25 .25 .25 .25 0 0 0 0
   1,1 
 
.25 .25 .25 .25 0 0 0 0
   x1,2 
 x1,. 

  
x 
.25 .25 .25 .25 0 0 0 0
   x1,3 
 1,. 

  x 
x 
.25 .25 .25 .25 0 0 0 0
   1,4  =  1,. 
PD X  
 0 0 0 0
.25 .25 .25 .25   x2,1 
 x2,. 

  
x 
.25 .25 .25 .25  x2,2
 0 0 0 0
 2,. 



 0 0 0 0

.25 .25 .25 .25  x 
 x2,. 

   2,3 
 
.25 .25 .25 .25  x
 0 0 0 0
 x2,. 
 2,4 
Again, there are 8 observations (nt), but obviously only 2 (n) of them are independent.
Also, obviously, the between-groups estimator will not get rid of the individual fixed
effect, and so will be an inconsistent estimator for the model in equation 1 where the ai,
are correlated with the xi,t. Since Y can always be fully decomposed into a part explained
by D and a part orthogonal to D, that is Y  PDY  M DY (really cool result 1 in lecture
3), the OLS estimator of the panel data model can always be written as a weighted sum of
the within-groups estimator and between-groups estimator as follows:
7)
ˆ  ( X ' X )1 X 'Y  ( X ' X )1 X '( PDY  M DY )
= ( X ' X ) 1 X ' PDY  ( X ' X ) 1 X ' M DY
= ( X ' X ) 1 ( X ' PD X )( X ' PD X ) 1 X ' PDY  ( X ' X ) 1 ( X ' M D X )( X ' M D X ) 1 X ' M DY
= ( X ' X )1 X ' PD X   ( X ' X )1 X ' M D X 
That is, the OLS estimator for the panel data set is a matrix weighted average of the
between groups estimator (first term on the right hand side) and within groups estimator
(the far right hand side term). With correlated unobserved heterogeneity, as in equation
1, the within groups estimator is consistent and unbiased, but the between groups
estimator is not. So with correlated unobserved heterogeneity, the OLS estimator will be
inconsistent. If there is no unobserved heterogeneity, or it if is uncorrelated with all the
right hand side slope regressors, then both the between groups estimator and the within
groups estimators are consistent, so the OLS estimator is consistent.
Fixed effects are especially easy to estimate with STATA. Suppose that each individual
had a unique identifier, say “idnum”, and we wanted to do a fixed effects wage
regression. Here is the STATA code:
xtreg wage educ exper occ1 occ2 occ3 occ4, FE i(idnum);
The “xt” prefix is for longitudinal data, and there are lots of different programming
options available in STATA. To do random effects models in STATA (which I am not
going to discuss in this lecture), we use the following STATA code:
xtreg wage educ exper occ1 occ2 occ3 occ4, RE i(idnum);
SAS code (out of many possibilities) for the fixed effects model is
proc glm; absorb idnum;
model wage= educ exper occ1 occ2 occ3 occ4; run;
(OR, another one for fixed effects:
proc mixed; class idnum;
model wage= educ exper occ1 occ2 occ3 occ4 idnum/ solution;
run;)
And one type of SAS code for random effects models is
proc mixed; class idnum;
model wage= educ exper occ1 occ2 occ3 occ4 / solution;
random int/ subject=idnum; *random int—means make the intercept a random variable;
run;
*so only change the id in this last line to match your data ;
II. General setup statements for panel data sets not already in good shape
sort panelvar datevar; * 1st var is the individual id & 2nd var is time;
tsset panelvar datevar;
II. Tests for panel data sets
A. Random Effects vs. Fixed Effects
HAUSMAN TEST
xtreg y x1, fe;
estimates store fixed; *stores coefficients and covariance matrix of last regression;
xtreg y x1, re;
estimates store random;
hausman fixed random;
B. To see if time fixed effects are needed when running a FE model use the command
testparm. It is a joint test to see if the dummies for all years are equal to 0, if they are then
no time fixed effects are needed.
testparm;
_Iyear*;
In Stata 11and more recent versions can use:
Tesparm;
i.year;
Summary of basic models (FE/RE) see http://www.princeton.edu/~otorres/Panel101.pdf
For more information
Command Individual fixed effects
Individual fixed effects
xtreg
xtreg y x1 x2 x3 x4 x5 x6 x7, fe ;
areg
areg y x1 x2 x3 x4 x5 x6 x7, absorb(individ) ;
regress
xi: regress y x1 x2 x3 x4 x5 x6 x7 i.individ;
Individual and time fixed effects
xtreg
xi: xtreg y x1 x2 x3 x4 x5 x6 x7 i.year, fe ;
areg
xI: areg y x1 x2 x3 x4 x5 x6 x7 i.year, absorb(individ) ;
regress
xi: regress y x1 x2 x3 x4 x5 x6 x7 i.year i.individ;
Random effects
xtreg
xtreg y x1 x2 x3 x4 x5 x6 x7, re ;
III. Dynamically Completeness: an example using panel data
Essentially, a model is “dynamically complete” if it has enough lagged values (of the
independent or dependent variables, or both) so that the error exhibits no serial
correlation. Dynamically complete models have a regression, then, that satisfies
E ( yt | xt )  E ( yt | xt , other lagged dep and indep variables )
where xt possibly already contains some lagged variables in it, and “other lagged
dependent and independent variables” means just that: other lagged variables not already
contained in xt . The idea is that xt already contains all the necessary lagged variables in
it, and the “other lagged dependent and independent variables” aren’t needed for the
model. Since the model is
E( yt | xt )  0  1xt
then it follows that
E(t | xt )  E(t | xt , other lagged dep and indep variables ) = 0.
In particular, since “other lagged dependent and independent variables” could be used to
form prior values of the error term, then
E(t | xt )  E(t | xt , t 1, xt 1, t  2 ,) = 0.
We can use this last result on dynamically complete models, and the law of iterated
expectations, to prove that there is no serial correlation in dynamically complete models.
The law of iterated expectations is just the result (appendix B in Wooldridge) that
E [ E (Y | X )]  E (Y )
x
y
The E (Y | X ) term is just the expected value of Y given X, and so is a function of X.
y
The law just says that if we take the expected value of this conditional mean with respect
to the distribution of X, we get the (unconditional) mean of Y. The mathematical proof
for this law looks something like
 [ y
i
j
j
f ( y j | xi )] f ( xi ) 
 y  f (y
j
j
i
j
| xi ) f ( xi ) 
 y  f (y , x )   y
j
j
j
i
i
j
f (yj)
j
Wooldridge (chapter 11) applies the law of iterated expectations to E (t  s | xt xs ) , where t
 s, and shows that E ( t  s | xt xs ) =0 if E (t | xt )  E (t | xt , t 1, xt 1, t  2 ,) = 0. That is, if
the model is dynamically complete, there will be no serial correlation.
When would this be useful for your research? Suppose that you have a panel data set as
discussed in the last section of lecture 13, but we have no fixed effects (you have no
unobserved heterogeneity). Rather, assume that are many years of observations on the
same individual are correlated over time. The OLS estimated coefficients will be
consistent, even if there is correlation, but the standard errors computed will be
inappropriate unless the model is dynamically complete (i.e., that there is enough lagged
values of the independent and dependent variable, so that there is no remaining
autocorrelation in the errors). Suppose, for example, it is panel of wages regressed on
age, experience, race dummies, educational attainment, and occupational and year
dummy variables. You think that there is autocorrelation in wages for a given individual,
but the error will not be autocorrelated after you include lagged values of wages as an
additional regressor in the model (that this single lagged value of wages, yi ,t 1 , is enough
to make the model dynamically complete). So the basic model is
yi ,t   yi ,t 1  xi ,t   ui ,t
with  i ,t    i ,t 1   i ,t . No (first-order) autocorrelation would be   0. To test if this
is true (and thus, the model is dynamically complete given the lagged value of y on the
right hand side as a regressor, substitute from  i ,t    i ,t 1   i ,t into the regression
specification yi ,t   yi ,t 1  xi ,t   ui ,t and run the augmented regression
y i ,t   y i ,t 1  xi ,t    uˆ i ,t 1 +  i,t
where the residuals replace the unobserved error values, and again, where the test for
dynamic completeness is a test for whether ̂ =0. If you can’t reject this hypothesis, then
you can’t reject the hypothesis that the specification is dynamically complete, and you
can (sort-of, in the absence of other potential problems) trust that the OLS estimates of
standard errors and t-statistics are correct. The STATA code for this problem would be
something like the following:
**data sorted first by individual ID and then by year (time);
bysort id: gen lagwage = wage[_n-1]; *need to avoid lagging between people;
regress wage lagwage age male educ;
predict resids, residuals;
bysort id: gen lag_resids = resides[_n-1]; *need to avoid lagging between people;
regress wage lagwage age male educ lag_resids;
This is an example based on the wagepan.raw data (wage panel data, mentioned in
chapter 14 of Wooldridge):
# delimit ;
* reading in a panel data set from wooldridge, wagepan.raw, 8 years of data on ;
* each individual, data arranged by individual(nr) then by year(year)
;
infile nr
year black exper hisp hours married occ1
occ2 occ3 occ4 occ5 occ6 occ7 occ8 occ9
educ union lwage d81
d82
d83
d84
d85
d86
d87
expersq
using "g:\classrm_data\wooldridge\wagepan.raw", clear;
********* need to avoid lagging between people **************************;
** so the bysort only generates it for those with the same id, namely *******;
** the same nr values. So laglwage will be missing when change individuals: ;
** Jill's last observ shouldn't be lagged into Jim's first observation ******;
bysort nr: gen laglwage = lwage[_n-1];
*list nr year lwage laglwage;
regress lwage laglwage black hisp educ exper expersq married union occ1
occ2 occ4 occ5 occ6 occ7 occ8 occ9
d82
d83
d84
d85 d86 ;
predict resids, residuals;
bysort nr: gen lag_resids = resids[_n-1]; *need to avoid lagging between people;
list nr year resids lag_resids;
regress lwage laglwage black hisp educ exper expersq married union occ1
occ2 occ4 occ5 occ6 occ7 occ8 occ9
d82
d83
d84
d85 d86
lag_resids;
with the last regression given by
. regress lwage laglwage black hisp educ exper expersq married union occ1
> occ2
occ4
occ5
occ6
occ7
occ8
occ9
>
d82
d83
d84
d85 d86
lag_resids;
Source |
SS
df
MS
-------------+-----------------------------Model |
443.3267
22 20.1512136
Residual | 411.201889 3247 .126640557
-------------+-----------------------------Total | 854.528589 3269 .261403667
Number of obs
F( 22, 3247)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
3270
159.12
0.0000
0.5188
0.5155
.35587
-----------------------------------------------------------------------------lwage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------laglwage |
.883139
.0226862
38.93
0.000
.8386583
.9276197
black | -.0289215
.0205947
-1.40
0.160
-.0693013
.0114584
hisp |
.0054781
.017922
0.31
0.760
-.0296614
.0406176
educ |
.002805
.0051651
0.54
0.587
-.0073221
.0129321
exper | -.0220726
.0156947
-1.41
0.160
-.0528451
.0087
expersq |
.000722
.0008607
0.84
0.402
-.0009655
.0024095
married |
.0176767
.0132501
1.33
0.182
-.0083026
.0436561
union |
.0454948
.0155623
2.92
0.003
.0149819
.0760078
occ1 |
.0239954
.0320322
0.75
0.454
-.0388099
.0868006
occ2 |
.0052215
.0323473
0.16
0.872
-.0582016
.0686446
occ4 | -.0264225
.0322614
-0.82
0.413
-.0896773
.0368322
occ5 | -.0144614
.0297461
-0.49
0.627
-.0727844
.0438616
occ6 | -.0371003
.0304375
-1.22
0.223
-.0967788
.0225783
occ7 |
-.061769
.0347061
-1.78
0.075
-.1298171
.0062791
occ8 | -.0914091
.0641928
-1.42
0.155
-.2172717
.0344534
occ9 | -.0410516
.0330692
-1.24
0.215
-.1058902
.0237871
d82 | -.0936457
.0333213
-2.81
0.005
-.1589786
-.0283128
d83 | -.0788313
.0289228
-2.73
0.006
-.13554
-.0221226
d84 | -.0413905
.0257387
-1.61
0.108
-.0918564
.0090753
d85 | -.0420141
.0235835
-1.78
0.075
-.0882541
.004226
d86 | -.0170329
.02216
-0.77
0.442
-.0604818
.026416
lag_resids |
-.443724
.0273152
-16.24
0.000
-.4972807
-.3901672
_cons |
.3878219
.1081133
3.59
0.000
.1758446
.5997991
------------------------------------------------------------------------------
The statistically significant coefficient on the lagged residual variable suggests that our
model is NOT dynamically complete, so that the OLS standard errors and t-statistics are
off, and we need to explicitly adjust for the auto-correlation (or add some more lagged
variables on the right hand side until it becomes dynamically complete).
Download