Panel Data - University of Vaasa

advertisement
Econometrics II
Seppo Pynnönen
Department of Mathematics and Statistics, University of Vaasa, Finland
January 12 – February 25, 2016
Seppo Pynnönen
Econometrics II
Panel Data
Part II
Panel Data
As of Feb 2, 2016
Seppo Pynnönen
Econometrics II
Panel Data
1
Panel Data
Pooling independent cross section across time
Fixed effects model
Two-period panel data analysis
More than two time periods
Fixed effects method
Dummy variable regression
Fixed effects or first differencing?
Balanced and unbalanced panels
Random effects models
Random effects or fixed effects
Hausman specification test
Policy analysis with panel data
Dynamic Panel Models
Seppo Pynnönen
Econometrics II
Panel Data
Data sets that combine time series and cross sections data are
common in economics.
Independently pooled cross section:
Data are obtained by sampling randomly a large population at
different points in time (e.g., yearly).
Allows to investigate the effect of time. E.g., whether
relationships have changed.
Raises typically minor statistical complications.
Important feature:
The data set consists of independently sampled observations.
Seppo Pynnönen
Econometrics II
Panel Data
A panel data set (longitudinal data):
is a sample of same individuals, families, firms, cities . . ., are
followed across time.
E.g., OECD statistics contain numerous series observed yearly from
several countries.
Similarly time series data on several firms, industries, etc., are
these type of data.
Seppo Pynnönen
Econometrics II
Panel Data
Pooling independent cross section across time
1
Panel Data
Pooling independent cross section across time
Fixed effects model
Two-period panel data analysis
More than two time periods
Fixed effects method
Dummy variable regression
Fixed effects or first differencing?
Balanced and unbalanced panels
Random effects models
Random effects or fixed effects
Hausman specification test
Policy analysis with panel data
Dynamic Panel Models
Seppo Pynnönen
Econometrics II
Panel Data
Pooling independent cross section across time
Example 1
Women’s fertility over time: Data from General Social Survey contains
samples collected even years from 1972 to 1984.
Model for explaining total number of children born to a woman.
Data is available on the course web side (password protected).
Seppo Pynnönen
Econometrics II
Panel Data
Pooling independent cross section across time
* read data
.insheet using "fertil1.csv", comma clear
* describe data
.des
Contains data
obs:
1,129
vars:
14
size:
24,838 (99.9% of memory free)
-----------------------------------------------------------storage display
value
variable name
type
format
label
variable label
-----------------------------------------------------------year
byte
%8.0g
educ
byte
%8.0g
meduc
byte
%8.0g
feduc
byte
%8.0g
age
byte
%8.0g
kids
byte
%8.0g
black
byte
%8.0g
east
byte
%8.0g
northcen
byte
%8.0g
west
byte
%8.0g
farm
byte
%8.0g
othrural
byte
%8.0g
town
byte
%8.0g
smcity
byte
%8.0g
Seppo Pynnönen
Econometrics II
Panel Data
Pooling independent cross section across time
. tabstat kids, statistics( mean count ) by(year) columns(statistics)
Summary for variables: kids
by categories of: year
year |
mean
N
---------+-------------------72 |
3.0
156
74 |
3.2
173
76 |
2.8
152
78 |
2.8
143
80 |
2.8
142
82 |
2.4
186
84 |
2.2
177
---------+-------------------Total |
2.7
1129
------------------------------
Seppo Pynnönen
Econometrics II
Panel Data
Pooling independent cross section across time
Number of children per woman
N of children
4
3
2
70
72
74
76
78
80
82
Year
It is obvious that the fertility rate has declined over years
Seppo Pynnönen
Econometrics II
84
86
Panel Data
Pooling independent cross section across time
The analysis can be substantially elaborated by regression analysis.
After controlling other factors (educations, age, etc.), what has happened
to the fertility rate?
Build a regression with year dummies: y74 for 1974, · · · , y84 for year
1984.
Year 1972 is the base year.
Seppo Pynnönen
Econometrics II
Panel Data
Pooling independent cross section across time
. reg kids educ age age2 black east northcen west farm ///
y74 y76 y78 y80 y82 y84
Source |
SS
df
MS
-------------+-----------------------------Model | 389.777313
14 27.8412367
Residual | 2695.73199 1114 2.41986713
-------------+-----------------------------Total |
3085.5093 1128 2.73538059
Number of obs
F( 14, 1114)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
1129
11.51
0.0000
0.1263
0.1153
1.5556
----------------------------------------------------kids |
Coef.
Std. Err.
t
P>|t|
-------------+--------------------------------------educ | -.1242409
.0181486
-6.85
0.000
age |
.5381453
.1384005
3.89
0.000
age2 | -.0058679
.0015645
-3.75
0.000
black |
1.083783
.1734035
6.25
0.000
east |
.2276015
.1312518
1.73
0.083
northcen |
.3713906
.1199679
3.10
0.002
west |
.2188689
.1663522
1.32
0.189
farm | -.0918808
.122027
-0.75
0.452
y74 |
.2586277
.1727165
1.50
0.135
y76 | -.1012358
.1787317
-0.57
0.571
y78 | -.0671507
.1814491
-0.37
0.711
y80 | -.0751199
.1827069
-0.41
0.681
y82 | -.5323518
.1723385
-3.09
0.002
y84 | -.5383952
.174472
-3.09
0.002
_cons | -7.894707
3.05159
-2.59
0.010
-----------------------------------------------------
Seppo Pynnönen
Econometrics II
Panel Data
Pooling independent cross section across time
Sharp drop in fertility in the early 1980s (others are not statistically
significant).
E.g., the coefficient on y82 indicates that, holding other factors fixed
(educ, age, and others), per 100 women there were about 53 less children
than in 1972.
In particular, since education is controlled, this decline is separate from
the decline due to the increase in eduction.
Women with more education have fewer children (coefficient −0.12 is
highly statistically significant with t = −6.85 and p-value < 0.0005).
Other things equal, per 100 women with a college education tend to have
4 × 0.124 = 0.496, i.e., about 50 children less than women with only high
school education.
Seppo Pynnönen
Econometrics II
Panel Data
Pooling independent cross section across time
In summary, pooled cross section data (independent samples)
problems can be analyzed utilizing dummy variables.
Seppo Pynnönen
Econometrics II
Panel Data
Fixed effects model
1
Panel Data
Pooling independent cross section across time
Fixed effects model
Two-period panel data analysis
More than two time periods
Fixed effects method
Dummy variable regression
Fixed effects or first differencing?
Balanced and unbalanced panels
Random effects models
Random effects or fixed effects
Hausman specification test
Policy analysis with panel data
Dynamic Panel Models
Seppo Pynnönen
Econometrics II
Panel Data
Fixed effects model
1
Panel Data
Pooling independent cross section across time
Fixed effects model
Two-period panel data analysis
More than two time periods
Fixed effects method
Dummy variable regression
Fixed effects or first differencing?
Balanced and unbalanced panels
Random effects models
Random effects or fixed effects
Hausman specification test
Policy analysis with panel data
Dynamic Panel Models
Seppo Pynnönen
Econometrics II
Panel Data
Fixed effects model
From each individual (people, firms, schools, cities, countries, etc.)
data are collected at two time points, t = 1 and t = 2.
In usual regression one major source of bias stems from omitted
(important) variables.
For example, if the true model is
yi = β0 + β1 xi + β2 zi + ui ,
(1)
yi = β0 + β1 xi + vi ,
(2)
vi = β2 zi + ui ,
(3)
but we estimate
where
the bias in OLS estimator β̂1 from model (2) is
Pn
h i
(xi − x̄)zi
E β̂1 − β1 = β2 Pi=1
,
n
2
i=1 (xi − x̄)
(4)
which can be substantial if x and z are correlated and β2 is large.
Seppo Pynnönen
Econometrics II
Panel Data
Fixed effects model
The problem is that we usually do not know if important variables
are missing from our model!
Use of panel data makes it possible to eliminate the omitted
variable bias in certain cases.
Suppose that we have the following situation in terms of model (1)
yit = β0 + β1 xit + β2 zi + uit ,
(5)
where i refers to individual i and t to time point t.
Thus, we have panel data where data is collected from each
individual i at different time points t (in the two period case,
t = 1, 2).
Note that in (5) zi does not have the time index, which implies
that variable z is time invariant (or at least changing very slowly
with time).
Seppo Pynnönen
Econometrics II
Panel Data
Fixed effects model
Suppose, we have from each of the n individuals observations on
yit and xit at time points t = 1 and t = 2, thus altogether 2n
observations.
However, we do not observe zi .
Suppose further that we allow the possibility that intercept β0 may
be different at different time points, such that (5) can be written as
yit = β0 + δ0 Dt + β1 xit + β2 zi + uit ,
where Dt = 0 for t = 1 and Dt = 1 for t = 2 (time dummy).
Seppo Pynnönen
Econometrics II
(6)
Panel Data
Fixed effects model
Then taking differences
∆yi = yi2 − yi1 ,
the model in (6) becomes
∆yi = δ0 + β1 ∆xi + ∆ui ,
(7)
i.e., the (unobserved) omitted variable disappears and estimating
the slope parameter β1 with OLS is unbiased.
Seppo Pynnönen
Econometrics II
Panel Data
Fixed effects model
The above generalizes immediately such that if we denote
ai = z0i γ = γ1 zi1 + γ2 zi2 + · · · + γq ziq
(8)
and enhance (6) to
yit = β0 + δ0 Dt + βxit + ai + uit ,
(9)
taking differences reduces again to estimation model (7).
The above model is called the fixed effect (FE) model in which ai is fixed
over the time periods (ai can be a random variable, and can correlate
with the explanatory variable xit ).
If ai is not correlated with other explanatory variables, the model is called
random effect (RE) model and is estimated with different techniques that
are supposed to yield more efficient estimators to β-parameters than the
fixed effect methods (that are basically OLS methods). We will return to
the RE model later.
Seppo Pynnönen
Econometrics II
Panel Data
Fixed effects model
In the FE case the resulting estimators of the regression
parameters from the first-differenced equation with OLS are
called the first-differenced estimators (FD estimators).
We will deal with other fixed effect estimators later.
In summary:
Differencing eliminates all unobserved time invariant factors
from the model.
A major pitfall is that differencing also wipes out observed
time invariant variables (like gender) from the model!
FE cannot be used in these cases (if we want to estimate these
effects), or in cases where the explanatory variables change
very slowly across time (the difference is nearly zero).
Seppo Pynnönen
Econometrics II
Panel Data
Fixed effects model
In many cases the FD-method is useful, however.
The following example highlight the biasing effect of unobserved
factors and how panel estimation with the simple FD-method likely
solves the problem.
Example 2
Data set crime2.xls (Wooldridge) contains data on crime and
unemployment rates for 46 US cities for 1982 (t = 1) and 1987 (t = 2).
Running simple cross section regression of crmrte on unem by using only
1987 yields
Seppo Pynnönen
Econometrics II
Panel Data
Fixed effects model
. regress crmrte unem if year==87
Source |
SS
df
MS
-------------+-----------------------------Model | 1775.90928
1 1775.90928
Residual | 52674.6428
44 1197.15097
-------------+-----------------------------Total | 54450.5521
45 1210.01227
Number of obs
F( 1,
44)
Prob > F
R-squared
Adj R-squared
Root MSE
----------------------------------------------------crmrte |
Coef.
Std. Err.
t
P>|t|
-------------+--------------------------------------unem | -4.161134
3.416456
-1.22
0.230
_cons |
128.3781
20.75663
6.18
0.000
-----------------------------------------------------
Seppo Pynnönen
Econometrics II
=
=
=
=
=
=
46
1.48
0.2297
0.0326
0.0106
34.6
Panel Data
Fixed effects model
Coefficient of crmrte is negative, −4.16!
However, not statistically significant.
Likely suffers from omitted variables problem (age distribution,
gender distribution, eduction levels, . . .).
Most of these can be expected to be fairly stable across time. Thus,
use of panel data techniques may be helpful.
Before proceeding to the panel data estimation, let us see what happens
if we simply pool the two years and estimate
crmrte = β0 + δ0 D87 + β1 unem + u,
where D87 is the year 1987 dummy.
Seppo Pynnönen
Econometrics II
(10)
Panel Data
Fixed effects model
. regress crmrte d87 unem
Source |
SS
df
MS
-------------+-----------------------------Model | 989.717314
2 494.858657
Residual | 80055.7864
89 899.503218
-------------+-----------------------------Total | 81045.5037
91 890.609931
Number of obs
F( 2,
89)
Prob > F
R-squared
Adj R-squared
Root MSE
=
92
=
0.55
= 0.5788
= 0.0122
= -0.0100
= 29.992
----------------------------------------------------crmrte |
Coef.
Std. Err.
t
P>|t|
-------------+--------------------------------------d87 |
7.940413
7.975324
1.00
0.322
unem |
.4265461
1.188279
0.36
0.720
_cons |
93.42026
12.73947
7.33
0.000
-----------------------------------------------------
The situation does not change much qualitatively!
Seppo Pynnönen
Econometrics II
Panel Data
Fixed effects model
For example, Stata has very sophisticated panel data procedures.
We discuss some of them later.
The FD-method can be applied by using the regress routine by
first declaring the data as a panel data with the xtset command
(Menu: Statistics > Longitudinal/panel data > Setup
and utilities > Declare data set to be panel data).
In Eviews: Proc > Structure/Resize Current Page. . ., and
follow the instructions.
In SAS: proc panel data = crime2; model crmrte =
unemp; id = state year; end; Before applying proc panel
the data must be sorted by proc sort.
Whichever software is used, identifiers for the individuals (in
particular) are needed to indicate the multiple measurements on an
individual.
Seppo Pynnönen
Econometrics II
Panel Data
Fixed effects model
After declaring to the program the panel structure, the model
∆crmrte = δ0 + β1 ∆umem + ∆u
can be estimated with the FD difference method e.g., in Stata as
(d.crmrte means crmrte87 − crmrte82 ):
. reg d.crmrte d.unem
Source |
SS
df
MS
-------------+-----------------------------Model | 2566.43056
1 2566.43056
Residual | 17689.5426
44 402.035059
-------------+-----------------------------Total | 20255.9732
45 450.132737
Number of obs
F( 1,
44)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
46
6.38
0.0152
0.1267
0.1069
20.051
----------------------------------------------------D.crmrte |
Coef.
Std. Err.
t
P>|t|
-------------+--------------------------------------unem |
D1. |
2.217996
.8778657
2.53
0.015
|
_cons |
15.40219
4.702116
3.28
0.002
-----------------------------------------------------
Seppo Pynnönen
Econometrics II
(11)
Panel Data
Fixed effects model
In Eviews, after the data has been reshaped to panel data, the
FD-estimatation can be worked out using Quick > Estimate
Equation. . . to open the Equation Estimation command
window to input d(cmrte) c d(unem) to get the results similar
to above.
The coefficient estimate of the β̂1 ≈ 2.22 is now highly statistically
significant and of expected sign.
The model predicts that one percent increase in unemployment increases
crimes by about 2.2 per 1, 000 people.
The constant term indicates that even if the change in unemployment
rate were zero, the crime rate has generally increased during the period
from 1982 to 1987 by about 15.4 crimes per 1,000 people.
Seppo Pynnönen
Econometrics II
Panel Data
Fixed effects model
Note that the time dummy component δ0 in (11) captures all unobserved
time effect that are common to all cross-sectional individuals.
That is, we can consider δ0 to represent
δ0 = z0t δ = δ1 z1t + δ2 z2t + · · · + δp zpt ,
where zt ’s are common trend components affecting all individual crime
rates with same intensity.
Seppo Pynnönen
Econometrics II
Panel Data
Fixed effects model
1
Panel Data
Pooling independent cross section across time
Fixed effects model
Two-period panel data analysis
More than two time periods
Fixed effects method
Dummy variable regression
Fixed effects or first differencing?
Balanced and unbalanced panels
Random effects models
Random effects or fixed effects
Hausman specification test
Policy analysis with panel data
Dynamic Panel Models
Seppo Pynnönen
Econometrics II
Panel Data
Fixed effects model
Differencing can be used with more than two time periods to work
out fixed effect estimation.
As an example consider a three period model.
yit
= δ1 + δ2 D2t + δ3 D3t + β1 xit1 + · · · + βk xitk + uit (12)
for t = 1, 2, 3, where D2t = 1 for period t = 2 and zero otherwise
and D3t = 1 for t = 3 and zero othewrise.
Differencing yields
∆yit
= δ2 ∆D2t + δ3 ∆D3t + β1 ∆xit1 + · · · + βk ∆xitk + ∆uit (13)
t = 2, 3.
Note: For t = 2, ∆D2t = 1 and ∆D3t = 0 = D3t ; for t = 3,
∆Dt2 = −1 and ∆D3t = 1 = D3t .
Again it is simple to estimate with OLS the model.
Seppo Pynnönen
Econometrics II
Panel Data
Fixed effects model
Remark 1
Model in (13) is usual reparametrized into an equivalent form
∆yit = α0 + α3 D3t + β1 ∆xit1 + · · · + βk ∆itk + ∆uit .
(14)
This generalizes to T time periods with time dummies D1t , D2t , . . . , DTt
∆yit
=
α0 + α3 D3t + · · · + αT DTt
+β1 ∆xit1 + · · · + βk ∆itk + ∆uit .
Seppo Pynnönen
Econometrics II
(15)
Panel Data
Fixed effects model
1
Panel Data
Pooling independent cross section across time
Fixed effects model
Two-period panel data analysis
More than two time periods
Fixed effects method
Dummy variable regression
Fixed effects or first differencing?
Balanced and unbalanced panels
Random effects models
Random effects or fixed effects
Hausman specification test
Policy analysis with panel data
Dynamic Panel Models
Seppo Pynnönen
Econometrics II
Panel Data
Fixed effects model
An alternative method, which works in certain cases better than
the FD-method, is called the fixed effects method.
Consider the simple case model of
yit = β1 xit + ai + uit ,
(16)
i = 1, . . . , n, t = 1, . . . , T .
Thus there are altogether n × T observations.
Define means over the T time periods
ȳi =
T
1 X
yit ,
T t=1
x̄i =
T
1 X
xit ,
T t=1
Seppo Pynnönen
ūi =
Econometrics II
T
1 X
uit .
T t=1
(17)
Panel Data
Fixed effects model
Then
ȳi = β1 x̄i + ai + ūi .
Note that
(18)
T
1 X
1
ai = Tai = ai .
T
T
t=1
Thus, subtracting (18) from (16) eliminates ai and gives
yit − ȳi = β1 (xit − x̄i ) + (uit − ūi )
(19)
ẏit = β1 ẋit + u̇it ,
(20)
or
where e.g., ẏit = yit − ȳi is the time demeaned data on y .
This transformation is also called the within transformation and
resulting (OLS) estimators of the regression parameters applied to
(20) are called fixed effect estimators or within estimators.
Seppo Pynnönen
Econometrics II
Panel Data
Fixed effects model
In the two period case the FD method and FE lead to identical
results.
Remark 2
The slope coefficient β1 estimated from (18) is called the
between estimator. vi = ai + ūi is the error term. The estimator is
biased, however, if the unobserved component ai is correlated with x.
Remark 3
When estimating the unobserved effect by the fixed effect (FE) method,
it is unfortunately not clear how the goodness-of-fit R-square should be
computed. Stata produces three different R-squares: within, between,
and total.
Seppo Pynnönen
Econometrics II
Panel Data
Fixed effects model
Remark 4
Usually a full set of year dummies (i.e., year dummies for all years but the
first) are included in FE estimation to capture time variation. However,
then the effect of any variable whose change across time is constant
cannot be estimated (an example of such a variable is experience
measure by the number of year; experience increases every year by one).
Remark 5
Although time invariant variables cannot be included by themselves in a
FE mode, their interactions with year dummies can. For example, in a
wage equation (year dummy) x (education) measure the change in return
of education over time.
Seppo Pynnönen
Econometrics II
Panel Data
Fixed effects model
1
Panel Data
Pooling independent cross section across time
Fixed effects model
Two-period panel data analysis
More than two time periods
Fixed effects method
Dummy variable regression
Fixed effects or first differencing?
Balanced and unbalanced panels
Random effects models
Random effects or fixed effects
Hausman specification test
Policy analysis with panel data
Dynamic Panel Models
Seppo Pynnönen
Econometrics II
Panel Data
Fixed effects model
Yet another method is to introduce dummy variables for the cross
section unit (N − 1 dummy variables) and (possibly) for the
periods (T − 1 dummies).
If N and T are large this is not very practical.
Gives the same estimates for the regression coefficients as the time
demeaned method and the standard errors and major statistics are
the same.
Seppo Pynnönen
Econometrics II
Panel Data
Fixed effects model
Example 3
Papke (1994), Journal of Public Economics 54, 37–49, studied the effect
of Indiana enterprise zone program on unemployment, years 1980–1988
(Wooldridges data base, file: ezunem.xls). Six zones designated 1984
and four more in 1985. Twelve cities did not receive a zone (control
group).
An evaluation model of the policy is
log(uclmsit ) = θt + β1 Dit + ai + uit
(21)
where θt indicates time varying intercept, ucclmsit is the number
unemployment claims during year t in city i, and Dit = 1 if the city i had
the zone in year t and zero otherwise.
First Difference estimates for β1 :
Seppo Pynnönen
Econometrics II
Panel Data
Fixed effects model
. reg d.luclms d82 d83 d84 d85 d86 d87 d88 d.ez
Source |
SS
df
MS
-------------+-----------------------------Model | 12.8826331
8 1.61032914
Residual | 7.79583815
167 .046681666
-------------+-----------------------------Total | 20.6784713
175 .118162693
Number of obs
F( 8,
167)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
176
34.50
0.0000
0.6230
0.6049
.21606
----------------------------------------------------D.luclms |
Coef.
Std. Err.
t
P>|t|
-------------+--------------------------------------(Year dummy variable estimates results deleted)
ez |
D1. | -.1818775
.0781862
-2.33
0.021
_cons | -.3216319
.046064
-6.98
0.000
-----------------------------------------------------
Seppo Pynnönen
Econometrics II
Panel Data
Fixed effects model
The estimate of β1 , β̂1 = −.182 indicates that the presence of an EZ
causes about a 16.6% (e −.182 − 1 = .166) fall in unemployment claims,
which is both economically and statistically significant (t-val 2.33).
Seppo Pynnönen
Econometrics II
Panel Data
Fixed effects model
Fixed Effect estimation results
. xtreg luclms d82 d83 d84 d85 d86 d87 d88 ez, fe
R-sq:
within = 0.8148
between = 0.0002
overall = 0.3415
corr(u_i, Xb)
= -0.0040
F(8,168)
Prob > F
=
=
92.36
0.0000
----------------------------------------------------luclms |
Coef.
Std. Err.
t
P>|t|
-------------+--------------------------------------ez | -.1044148
.059753
-1.75
0.082
_cons |
11.53358
.0325925
353.87
0.000
-------------+--------------------------------------sigma_u | .55551522
sigma_e | .21619434
rho | .86846297 (fraction of variance due to u_i)
----------------------------------------------------------F test that all u_i=0: F(21, 168) = 59.31 Prob > F = 0.0000
Seppo Pynnönen
Econometrics II
Panel Data
Fixed effects model
Dummy variable regression:
. reg luclms d82 d83 d84 d85 d86 d87 d88 ///
c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 ///
c14 c15 c16 c17 c18 c19 c20 c21 c22 ez
Source |
SS
df
MS
-------------+-----------------------------Model | 92.6439601
29 3.19461932
Residual | 7.85231887
168 .046739993
-------------+-----------------------------Total | 100.496279
197 .510133396
Number of obs
F( 29,
168)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
198
68.35
0.0000
0.9219
0.9084
.21619
----------------------------------------------------luclms |
Coef.
Std. Err.
t
P>|t|
-------------+--------------------------------------(dummy variable results removed)
ez | -.1044148
.059753
-1.75
0.082
_cons |
11.51534
.0799536
144.03
0.000
-----------------------------------------------------
The results show that the FE and DVRM results are exactly the same.
Using the FE results, the coefficient −0.104 implies about 10.4 percent
drop in the unemployment claims due to the program. The estimate is
significant in one-tailed testing but not in two-tailed testing.
Seppo Pynnönen
Econometrics II
Panel Data
Fixed effects model
1
Panel Data
Pooling independent cross section across time
Fixed effects model
Two-period panel data analysis
More than two time periods
Fixed effects method
Dummy variable regression
Fixed effects or first differencing?
Balanced and unbalanced panels
Random effects models
Random effects or fixed effects
Hausman specification test
Policy analysis with panel data
Dynamic Panel Models
Seppo Pynnönen
Econometrics II
Panel Data
Fixed effects model
If the number of periods is 2 (T = 2) FE and FD give
identical results.
When T ≥ 3 the FE and FD are not the same.
Both are unbiased under assumptions FE.1–FE.4
FE.1 For each i, the model is
yit = β1 xit1 + · · · + βk xitk + ai + uit , t = 1, . . . T .
FE.2 We have a random sample from the cross section.
FE.3 Each explanatory variables changes over time, and they are not
perfectly collinear.
FE.4 E[uit |Xi , ai ] = 0 for all time periods (Xi stands for all
explanatory variables).
FE.5 var[uit |Xi , ai ] = σu2 for all t = 1, . . . , T .
FE.6 cov[uit , uis ] = 0 for all t 6= s
FE.7 uit |Xi , ai ∼ NID(0, σu2 ).
Both are consistent under assumptions FE.1–FE.4 for fixed T
as n → ∞.
Seppo Pynnönen
Econometrics II
Panel Data
Fixed effects model
If uit is serially uncorrelated, FE is more efficient than FD (because
of this FE is more popular).
If uit is (highly) serially correlated, ∆uit may be less serially
correlated, which may favor FD over FE. However, typically T is
rather small, such that serial correlation is difficult to observe.
In sum, there are no clear cut guidelines to choose between these
two. Thus, a good advise is to check them them both and try to
determine why they differ if there is a big difference.
Seppo Pynnönen
Econometrics II
Panel Data
Fixed effects model
1
Panel Data
Pooling independent cross section across time
Fixed effects model
Two-period panel data analysis
More than two time periods
Fixed effects method
Dummy variable regression
Fixed effects or first differencing?
Balanced and unbalanced panels
Random effects models
Random effects or fixed effects
Hausman specification test
Policy analysis with panel data
Dynamic Panel Models
Seppo Pynnönen
Econometrics II
Panel Data
Fixed effects model
A data set is called a balanced panel if the same number of time
series observations are available for each cross section units. That
is T is the same for all individuals. The total number of
observations in a balanced panel is nT .
All the above examples are balanced panel data sets.
If some cross section units have missing observations, which
implies that for an individual i there are available Ti time period
observations i = 1, . . . , n, Ti 6= Tj for some i and j, we call the
data set an unbalanced panel. The total number of observations
in an unbalanced panel is T1 + · · · + Tn .
In most cases unbalanced panels do not cause major problems to
fixed effect estimation.
Modern software packages make appropriate adjustments to
estimation results.
Seppo Pynnönen
Econometrics II
Panel Data
Random effects models
1
Panel Data
Pooling independent cross section across time
Fixed effects model
Two-period panel data analysis
More than two time periods
Fixed effects method
Dummy variable regression
Fixed effects or first differencing?
Balanced and unbalanced panels
Random effects models
Random effects or fixed effects
Hausman specification test
Policy analysis with panel data
Dynamic Panel Models
Seppo Pynnönen
Econometrics II
Panel Data
Random effects models
Consider the simple unobserved effects model
yit = β0 + β1 xit + ai + uit ,
(22)
i = 1, . . . , n, t = 1, . . . , T .
Typically also time dummies are also included to (22).
Using FD or FE eliminates the unobserved component ai .
However, if ai is uncorrelated with xit using random effect (RE)
estimation can lead to more efficient estimation of the regression
parameters.
Seppo Pynnönen
Econometrics II
Panel Data
Random effects models
Generally, we call the model in equation (22) the random effects
model if ai is uncorrelated with all explanatory variables, i.e.,
cov[xit , ai ] = 0, t = 1, . . . , T .
(23)
How to estimate β1 efficiently?
If (23) holds, β1 can be estimated consistently from a single cross
section.
Obviously this discards lots of useful information.
Seppo Pynnönen
Econometrics II
Panel Data
Random effects models
If the data set is simply pooled and the error term is denoted as
vit = ai + uit , we have the regression
yit = β0 + β1 xit + vit .
(24)
σa2
σa2 + σu2
(25)
Then
corr[vit , vis ] =
for t 6= s, where σa2 = var[ai ] and σu2 = var[uit ].
That is, the error terms vit are (positively) autocorrelated, which
biases the standard errors of the OLS β̂1 .
Seppo Pynnönen
Econometrics II
Panel Data
Random effects models
If σa2 and σu2 were known, optimal estimators (BLUE) would be
obtained the generalized least squares (GLS), which in this case
would reduce to estimate the regression slope coefficients from the
quasi demeaned equation
yit − λȳt = β0 (1 − λ) + β1 (xit − λx̄i ) + (vit − λv̄i ),
where
λ=1−
σu2
σu2 + T σa2
(26)
12
.
(27)
In practice σu2 and σa2 are unknown, but they can be estimated.
Seppo Pynnönen
Econometrics II
Panel Data
Random effects models
One method is to estimate (24) from the pooled data set and use
the OLS residuals v̂it to estimate σa2 and σu2 and plug them into
(27).
There resulting GLS estimators for the regression slope coefficients
are called random effects estimators (RE estimators).
Under the random effects assumptions2 the estimators are
consistent, but not unbiased.
They are also asymptotically normal as n → ∞ for fixed T .
However, with small n and large T properties of the RE estimator
is largely unknown.
2
The ideal random effects assumptions include FE.1, FE.2, FE.4–FE.6.
FE.3 is replaced with
RE.3: There are no perfect linear relationships among the explanatory variables.
RE.4: In addition of FE.4, E[ai |Xi ] = 0.
Seppo Pynnönen
Econometrics II
Panel Data
Random effects models
It is notable that λ = 1 results in (26) results to the pooled
regression and FE obtained with λ = 0.
RE estimation is available in modern statistical packages with
different options.
Example 4
Data set wagepan.xls (Wooldridge): n = 545, T = 8.
Is there a wage premium in belonging to labor union?
log(wageit )
= β0 + β1 educit + β3 exprit + β4 expr2it
+β5 marriedit + β6 unionit + ai + uit
Year dummies for 1980–1987 are included.
It is notable that with inclusion of full set of year dummies implies that
one cannot estimate with the FE method effects that change a constant
amount over time. Experience (exper) is such a variable.
Seppo Pynnönen
Econometrics II
Panel Data
Random effects models
------------------------------------------lwage |
Pooled
Random
Fixed
|
OLS
Effects
Effects
--------+---------------------------------educ |
.0989945
.0906150
..
| (.0046227) (.0105807)
exper |
.0861696
.1027934
..
| (.0101415) (.0153853)
exper2 | -.0027349 -.0046859 -.0051855
| (.0007099) (.0006896) (.0007044)
married |
.1230113
.0678821
.0466804
| (.0155714) (.0167369) (.0183104)
union |
.1685243
.1031103
.0800019
| (.0170652) (.0178388) (.0193103)
-------------------------------------------
It is notable that OLS standard errors tend to be smaller than in the RE
or FE cases.
OLS standard errors underestimate the true standard errors.
OLS coefficient estimates also suffer from the omitted variable problem
accounted in panel estimation.
Stata estimate of the correlation in (25) is .464.
Seppo Pynnönen
Econometrics II
Panel Data
Random effects or fixed effects
1
Panel Data
Pooling independent cross section across time
Fixed effects model
Two-period panel data analysis
More than two time periods
Fixed effects method
Dummy variable regression
Fixed effects or first differencing?
Balanced and unbalanced panels
Random effects models
Random effects or fixed effects
Hausman specification test
Policy analysis with panel data
Dynamic Panel Models
Seppo Pynnönen
Econometrics II
Panel Data
Random effects or fixed effects
FE is widely considered preferable because it allows correlation
between ai and x variables.
Given that the common effects, aggregated to ai is not correlated
with x variables, an obvious advantage of the RE is that it allows
also estimation of the effects of factors that do not change in time
(like education in the above example).
Typically the condition that common effects ai is not correlated
with the regressors (x-variables) should be considered more like an
exception than a rule, which favors FE.
Seppo Pynnönen
Econometrics II
Panel Data
Hausman specification test
1
Panel Data
Pooling independent cross section across time
Fixed effects model
Two-period panel data analysis
More than two time periods
Fixed effects method
Dummy variable regression
Fixed effects or first differencing?
Balanced and unbalanced panels
Random effects models
Random effects or fixed effects
Hausman specification test
Policy analysis with panel data
Dynamic Panel Models
Seppo Pynnönen
Econometrics II
Panel Data
Hausman specification test
Hausmanan (1978) devised a test for the orthogonality of the
common effects (ai ) and the regressors.
The test compares the fixed effect (OLS) and random effect (GLS)
estimates utilizing the Wald testing approach.
Seppo Pynnönen
Econometrics II
Panel Data
Hausman specification test
The basic idea of the test relies on the fact that under the null
hypothesis of orthogonality both OLS and GLS are consistent,
while under the alternative hypothesis GLS is not consistent.
Thus, under the null hypothesis OLS and GLS estimates should
not differ much from each other.
The test compares these estimates with Wald statistic.
In Stata performing Hausman requires that both OLS and GLS
regression results are saved for availability for the postestimation
test0 procedure.
Seppo Pynnönen
Econometrics II
Panel Data
Hausman specification test
Example 5
Applying the Hausman test to the case of Examle 4 can be in Stata
yields:
Seppo Pynnönen
Econometrics II
Panel Data
Hausman specification test
* Estimate fixed effects
xtreg lwage y81 y82 y83 y84 y85 y86 y87 exper2 married union, fe
* store the results into "hfixed"
estimates store hfixed
* Estimate the random effects model
xtreg lwage y81 y82 y83 y84 y85 y86 y87 educ exper exper2 married union, re
* store the results into "hrandom"
estimates store hrandom
* Hausman test
hausman hfixed hrandom
---- Coefficients ---|
(b)
(B)
(b-B)
sqrt(diag(V_b-V_B))
|
hfixed
hrandom
Difference
S.E.
--------+--------------------------------------------------------y81 | .1511912
.0427498
.1084414
.
y82 | .2529709
.035577
.2173939
.
y83 | .3544437
.0270943
.3273494
.
y84 | .4901148
.052207
.4379078
.
y85 | .6174822
.0690524
.5484299
.
y86 | .7654965
.1053229
.6601736
.
y87 | .9250249
.1505464
.7744785
.
exper2 | -.0051855 -.0046859
-.0004996
.000144
married | .0466804
.0678821
-.0212017
.0074261
union | .0800019
.1031103
-.0231085
.0073935
------------------------------------------------------------------b = consistent under Ho and Ha; obtained from xtreg
B = inconsistent under Ha, efficient under Ho; obtained from xtreg
Test:
Ho:
difference in coefficients not systematic
chi2(10) = (b-B)’[(V_b-V_B)^(-1)](b-B)
=
26.77
Prob>chi2 =
0.0028
(V_b-V_B is not positive definite)
Seppo Pynnönen
Econometrics II
Panel Data
Hausman specification test
The test rejects the orthogonality condition. Thus, FE should be used.
In Eviews Hausman test is obtained by first estimating the model
as a random effect model and then selecting
View > Fixed/Rendom Effect Testing > Correlated
Random Effects - Hausman Test
Seppo Pynnönen
Econometrics II
Panel Data
Policy analysis with panel data
1
Panel Data
Pooling independent cross section across time
Fixed effects model
Two-period panel data analysis
More than two time periods
Fixed effects method
Dummy variable regression
Fixed effects or first differencing?
Balanced and unbalanced panels
Random effects models
Random effects or fixed effects
Hausman specification test
Policy analysis with panel data
Dynamic Panel Models
Seppo Pynnönen
Econometrics II
Panel Data
Policy analysis with panel data
Panel data is useful for policy analysis, in particular, program
evaluation.
Example 6
Continue Example 1.2, where training program on worker productivity was evaluated.
The data include three years, 1987, 1988, and 1989.
The training program was implemented first time 1988.
We focus on the years 1987 (no program) and 1988 (program implemented) to see
whether the program benefits firms.
The model panel model is
log(scarpit ) = β0 + δ0 y 88 + β1 grantit + ai + uit ,
where y 88 is the year 1988 dummy (= 1 for year 1988 and = 0 otherwise) and ai
includes the unobserved firm effects (worker skill, etc.).
Seppo Pynnönen
Econometrics II
(28)
Panel Data
Policy analysis with panel data
Ignoring panel structure OLS results suggested no improvement.
Dependent Variable: LOG(SCRAP)
Method: Panel Least Squares
Sample: 1 471 IF YEAR < 1989
Periods included: 2
Cross-sections included: 54
Total panel (balanced) observations: 108
=====================================================
Variable Coefficient Std. Error t-Statistic Prob.
----------------------------------------------------C
0.523144
0.159783
3.274086 0.0014
GRANT
-0.058018
0.380949 -0.152299 0.8792
----------------------------------------------------R-squared
0.000219
Adjusted R-squared -0.009213
S.E. of regression
1.507393
F-statistic
0.023195
Prob(F-statistic)
0.879241
=====================================================
The coefficient for grant is not statistically significant, suggesting that
the program does not help in reducing the scrap rate.
Seppo Pynnönen
Econometrics II
Panel Data
Policy analysis with panel data
Accounting for the possible firm effects and imposing also the year
dummy to account for possible time effect, yields
=====================================================
Variable
Coefficient Std. Error t-Statistic Prob.
----------------------------------------------------C
0.568716
0.048603
11.70126 0.0000
GRANT
-0.317058
0.163875 -1.934753 0.0585
----------------------------------------------------Effects Specification
Cross-section fixed (dummy variables)
Period fixed (dummy variables)
R-squared
0.964308
Adjusted R-squared 0.926556
S.E. of regression 0.406642
F-statistic
25.54364
Prob(F-statistic) 0.000000
The estimate of the coefficient for the grant is negative and close to
statistically significant in two sided testing and significant in one sided
testing (program improves) for the alternative
H 1 : β1 < 0
significant at the 5% level with p-value 0.0265.
Seppo Pynnönen
Econometrics II
Panel Data
Dynamic Panel Models
1
Panel Data
Pooling independent cross section across time
Fixed effects model
Two-period panel data analysis
More than two time periods
Fixed effects method
Dummy variable regression
Fixed effects or first differencing?
Balanced and unbalanced panels
Random effects models
Random effects or fixed effects
Hausman specification test
Policy analysis with panel data
Dynamic Panel Models
Seppo Pynnönen
Econometrics II
Panel Data
Dynamic Panel Models
Many economic relationships are dynamic.
These may be characterized by the presence of lagged dependent
variables
yit = δyi,t−1 + x0it β + vit ,
(29)
where
vit = ai + uit
with ai ∼ iid(0, σa2 ) and uit ∼ iid(0, σu2 ) are independent,
i = 1, . . . , n, t = 1, . . . , T .
Seppo Pynnönen
Econometrics II
(30)
Panel Data
Dynamic Panel Models
Alternatively the one-way error component model in (30) can be a
two-way specification such that
vit = ai + bt + uit ,
(31)
where all the components are assumed again independent.
After differencing we have
∆yit = δ∆yi,t−1 + ∆x0it β + ∆uit .
(32)
The lagged term yi,t−1 as a regressor variable is correlated with
ui,t−1 , which causes problems in estimation.
Seppo Pynnönen
Econometrics II
Panel Data
Dynamic Panel Models
Once regressor variables are correlated with the error term, OLS or
GLS estimators become inconsistent.
A typical solution to the problem is to apply some kind of
instrumental variable estimation.
These are least squares (LS) or some other type of methods, where
instrumental variables are utilized to remove the inconsistency due
to the error term correlation with the regressors.
A variable is suitable for an instrumental variable if it is not
correlated with the error term, but is correlated with the regressors.
Thus, those regressors that are not correlated with the error term
can be used also as instruments.
Seppo Pynnönen
Econometrics II
Panel Data
Dynamic Panel Models
Example 7
2SLS (two state least squares).
Consider a standard regression model
yi = x0i β + ui ,
(33)
where xi is a k-vector of regressors (including the constant term) cov[xi , ui ] 6= 0,
i = 1, . . . , n.
Suppose we have m ≥ k, additional variables in zi (m-vector) such that cov[zi , ui ] = 0
but cov[zi , xi ] 6= 0.
2SLS solution for the problem is such that first (first stage) use OLS to regress
x-variables on z-variables.
In the second stage replace the original regressors xi by the predicted variables x̂i from
the first stage, and estimate β from the regression
yi = x̂0i β + ui .
(34)
β̂ 2SLS = (X̂0 X̂)−1 X̂0 y
(35)
The estimator
is called the 2SLS estimator of β.
Seppo Pynnönen
Econometrics II
Panel Data
Dynamic Panel Models
In particular, if m = k then (35) becomes
β̂ IV = (Z0 X)−1 Z0 y,
which is called the Instrumental Variable estimator of β.
Seppo Pynnönen
Econometrics II
(36)
Panel Data
Dynamic Panel Models
Example 8
(Data: http://eu.wiley.com/college/baltagi/ > Student companion site > datasets)
Demand for cigarettes in 46 US States [annual data, 1963–1992]. Estimated equation
cit = α + β1 ci,t−1 + β2 pit + β3 yit + β4 pnit + vit ,
(37)
vit = ai + bt + uit ,
(38)
where
ai and bt are fixed effects, uit ∼ NID(0, σu2 ), and all the observable variables are in
logarithms:
cit = real per capita sales of cigarettes by persons of smoking age (14 and older).
cigarette average price per pack
pit = real average retail price of a pack of cigarettes
yit = real per capital disposable income
pnit = the minimum real price of cigarettes in any neighboring state (proxy for casual
smuggling effect across state borders)
ci,t−1 is very likely correlated with uit .
Seppo Pynnönen
Econometrics II
Panel Data
Dynamic Panel Models
For reference purposes, estimating with panel OLS (average of within
group regressions with time dummies) yields
Fixed-effects (within) regression
Group variable: state
Number of obs
Number of groups
=
=
1334
46
R-sq:
Obs per group: min =
avg =
max =
29
29.0
29
within = 0.9283
between = 0.9859
overall = 0.9657
corr(u_i, Xb)
= 0.4743
F(32,1256)
Prob > F
=
=
508.07
0.0000
----------------------------------------------------lc |
Coef.
Std. Err.
t
P>|t|
-------------+--------------------------------------lc |
L1. |
.8302514
.0126242
65.77
0.000
|
lp | -.2916822
.0230847
-12.64
0.000
ly |
.1068698
.0233417
4.58
0.000
lpn |
.0354559
.02656
1.33
0.182
_cons |
.8204374
.2228775
3.68
0.000
-------------+--------------------------------------sigma_u | .02738301
sigma_e | .03504776
rho | .37905103
(fraction of variance due to u_i)
----------------------------------------------------F test that all u_i=0: F(45, 1256) = 4.52
Prob > F = 0.0000
Seppo Pynnönen
Econometrics II
Panel Data
Dynamic Panel Models
Several method are proposed to estimate when there is potential
correlation between the error term and (some) regressors.
GMM (Generalized Method of Moments) estimation has gained lately
much popularity, in particular when there are non-linear moment
restrictions.
Stata has xtdpd procedure which produces the Arellano and Bond or the
Arellano-Bover/Blundell-Bond estimator, which are GMM estimators,
where instruments are defined in a particular way (the idea will be
discussed in the classroom).
Seppo Pynnönen
Econometrics II
Panel Data
Dynamic Panel Models
xtdpd l(0/1).lc lp ly lpn y66-y92, div(lp ly lpn y66-y92) dgmmiv(lc)
Dynamic panel-data estimation Number of obs
Group variable: state
Number of groups
Time variable: year
Obs per group:
min
avg
max
Number of instruments =
437
= 1334
=
46
=
=
=
29
29
29
Wald chi2(31) = 13273.45
Prob > chi2
=
0.0000
One-step results
----------------------------------------------------lc |
Coef.
Std. Err.
z
P>|z|
-------------+--------------------------------------lc |
L1. |
.8201729
.0161446
50.80
0.000
|
lp | -.3607549
.0311244
-11.59
0.000
ly |
.1871102
.0334027
5.60
0.000
lpn | -.0215713
.0399233
-0.54
0.589
----------------------------------------------------Instruments for differenced equation
GMM-type: L(2/.).lc
Standard: D.lp D.ly D.lpn D.y66 D.y67 D.y68
D.y69 D.y70 D.y71 D.y72 D.y73 D.y74 D.y75
D.y76 D.y77 D.y78 D.y79 D.y80 D.y81 D.y82
D.y83 D.y84 D.y85 D.y86 D.y87 D.y88 D.y89
D.y90 D.y91 D.y92
Instruments for level equation
Standard: _cons
Seppo Pynnönen
Econometrics II
Panel Data
Dynamic Panel Models
Test for the orthogonality conditions of the instruments
Sargan test of overidentifying restrictions
H0: overidentifying restrictions are valid
chi2(405)
Prob > chi2
=
=
561.5047
0.0000
The orthogonality conditions are rejected.
The reason may be that that the errors are MA(1), which implies that
the GMM instruments (lct−2 , . . .) are correlated with the error term.
This can be tried to fix by defining starting from t − 3 with command
· · · dgmmiv(lc, lagrange(3 .)).
Doing this improved slightly the situation but still lead to rejection of the
orthogonality conditions.
Seppo Pynnönen
Econometrics II
Download