ch15paneldata - Memorial University of Newfoundland

advertisement
ECON 6002
Econometrics
Memorial University of Newfoundland
Panel Data Models
Adapted from Vera Tabakova’s notes

15.1 Grunfeld’s Investment Data

15.2 Sets of Regression Equations

15.3 Seemingly Unrelated Regressions

15.4 The Fixed Effects Model

15.4 The Random Effects Model

Extensions RCM, dealing with endogeneity when we have
static variables
Principles of Econometrics, 3rd Edition
Slide 15-2
The different types of panel data sets can be described as:

“long and narrow,” with “long” time dimension and “narrow”, few
cross sectional units;

“short and wide,” many units observed over a short period of time;

“long and wide,” indicating that both N and T are relatively large.
Principles of Econometrics, 3rd Edition
Slide 15-3
INVit  f Vit , Kit 
(15.1)
The data consist of T = 20 years of data (1935-1954) for
N = 10 large firms.
Value of stock, proxy for expected profits
Capital stock, proxy for desired permanent
Capital stock
Let yit = INVit and x2it = Vit and x3it = Kit
yit  1it  2it x2it  3it x3it  eit
(15.2)
Notice the subindices! 
Principles of Econometrics, 3rd Edition
Slide 15-4
INVGE ,t  1  2VGE ,t  3 K GE ,t  eGE ,t
t  1,
, 20
(15.3a)
INVWE ,t  1  2VWE ,t  3 KWE ,t  eWE ,t
yit  1  2 x2it  3 x3it  eit
t  1,
, 20
i  1, 2; t  1,
, 20
(15.3b)
For simplicity we focus on only two firms
keep if (i==3 | i==8) in STATA
Principles of Econometrics, 3rd Edition
Slide 15-5
INVGE ,t  1,GE  2,GEVGE ,t  3,GE K GE ,t  eGE ,t
t  1,
, 20
(15.4a)
INVWE ,t  1,WE  2,WEVWE ,t  3,WE KWE ,t  eWE ,t
yit  1i  2i x2it  3i x3it  eit
Principles of Econometrics, 3rd Edition
t  1,
i  1, 2; t  1,
, 20
, 20
(15.4b)
Slide 15-6
2
E  eGE ,t   0 var  eGE ,t   GE
cov  eGE ,t , eGE ,s   0
E  eWE ,t   0 var  eWE ,t   
cov  eWE ,t , eWE ,s   0
2
WE
(15.5)
Assumption (15.5) says that the errors in both investment functions
(i) have zero mean,
(ii) are homoskedastic with constant variance, and
(iii) are not correlated over time; autocorrelation does not exist.
2
2
The two equations do have different error variances GE
and WE
.
Principles of Econometrics, 3rd Edition
Slide 15-7
reg inv v k if i==3
scalar sse_ge = e(rss)
reg inv v k if i==8
scalar sse_we = e(rss)
Principles of Econometrics, 3rd Edition
Slide 15-8

Let Di be a dummy variable equal to 1 for the Westinghouse
observations and 0 for the General Electric observations. If the
variances are the same for both firms then we can run:
INVit  1,GE  1Di 2,GEVit  2 Di Vit 3,GE Kit  3 Di  Kit  eit
(15.6)
* Create dummy variable
gen d = (i == 8)
gen dv = d*v
gen dk = d*k
* Estimate dummy variable model
reg inv d v dv k dk
test d dv dk
Principles of Econometrics, 3rd Edition
Slide 15-9
Principles of Econometrics, 3rd Edition
Slide 15-10
* Goldfeld-Quandt test
scalar GQ = sse_ge/sse_we
scalar fc95 = invFtail(17,17,.05)
di "Goldfeld-Quandt Test statistic = " GQ
di "F(17,17,.95) = " fc95
Goldfeld-Quandt Test statistic = 7.45338
F(17,17,.95) = 2.2718929
So we reject equality at the 5% level…=> we cannot
really merge the two equations for now…
Principles of Econometrics, 3rd Edition
Slide 15-11
cov  eGE ,t , eWE ,t   GE ,WE
(15.7)
This assumption says that the error terms in the two equations, at the
same point in time, are correlated. This kind of correlation is called a
contemporaneous correlation.
Under this assumption, the joint regression would be better than the
separate simple OLS regressions
Principles of Econometrics, 3rd Edition
Slide 15-12
Econometric software includes commands for SUR (or SURE) that
carry out the following steps:
(i)
Estimate the equations separately using least squares;
(ii)
Use the least squares residuals from step (i) to estimate
2
2
GE
, WE
and GE ,WE ;
(iii)
Use the estimates from step (ii) to estimate the two equations jointly
within a generalized least squares framework.
Principles of Econometrics, 3rd Edition
Slide 15-13
Principles of Econometrics, 3rd Edition
Slide 15-14
* Open and summarize data (which is already in wide format!!!)
use grunfeld2, clear
summarize
* SUR
sureg ( inv_ge v_ge k_ge) ( inv_we v_we k_we), corr
test ([inv_ge]_cons = [inv_we]_cons) ([inv_ge]_b[v_ge] = [inv_we]_b[v_we])
([inv_ge]_b[k_ge] = [inv_we]_b[k_we])
Principles of Econometrics, 3rd Edition
Slide 15-15
There are two situations where separate least squares estimation is
just as good as the SUR technique :
(i)
when the equation errors are not contemporaneously correlated;
(ii)
when the same (the “very same”) explanatory variables appear in
each equation.
If the explanatory variables in each equation are different, then a test
to see if the correlation between the errors is significantly different
from zero is of interest.
Principles of Econometrics, 3rd Edition
Slide 15-16
2
ˆ GE
207.5871

,WE
 2 2 
 0.53139
ˆ GE ˆ WE  777.4463104.3079 
2
2
rGE
,WE
ˆ GE ,WE 
1
T  K GE T  KWE
20
 eˆGE ,t eˆWE ,t
t 1
1 20

eˆGE ,t eˆWE ,t

T  3 t 1
In this case we have 3 parameters in each equation so:
K GE  KWE  3.
Principles of Econometrics, 3rd Edition
Slide 15-17
Testing for correlated errors for two equations:
H0 : GE ,WE  0
2
2
LM  TrGE


,WE
(1) under H 0 .
LM = 10.628 > 3.84 (Breusch-Pagan test of independence: chi2(1))
Hence we reject the null hypothesis of no correlation between the
errors and conclude that there are potential efficiency gains from
estimating the two investment equations jointly using SUR.
Principles of Econometrics, 3rd Edition
Slide 15-18
Testing for correlated errors for three equations:
H 0 : 12  13  23  0
2
LM  T  r122  r132  r232  (3)
Principles of Econometrics, 3rd Edition
Slide 15-19
Testing for correlated errors for M equations:
M i 1
LM  T  rij2
i 2 j 1
Under the null hypothesis that there are no contemporaneous
correlations, this LM statistic has a χ2-distribution with M(M–1)/2
degrees of freedom, in large samples.
Principles of Econometrics, 3rd Edition
Slide 15-20
H0 : 1,GE  1,WE , 2,GE  2,WE , 3,GE  3,WE
(15.8)
Most econometric software will perform an F-test and/or a Wald χ2–test; in
the context of SUR equations both tests are large sample approximate tests.
The F-statistic has J numerator degrees of freedom and (MTK)
denominator degrees of freedom, where J is the number of hypotheses, M is
the number of equations, and K is the total number of coefficients in the
whole system, and T is the number of time series observations per equation.
The χ2-statistic has J degrees of freedom.
Principles of Econometrics, 3rd Edition
Slide 15-21

SUR is OK when the panel is long and narrow, not when it is short and wide.
Consider instead…
yit  1it  2it x2it  3it x3it  eit
(15.9)
We cannot consistently estimate the 3×N×T parameters in (15.9) with
only NT total observations. But we can impose some more
structure…
1it  1i , 2it  2 , 3it  3
(15.10)
We consider only one-way effects and assume common slope
parameters across cross-sectional units
Principles of Econometrics, 3rd Edition
Slide 15-22
All behavioral differences between individual firms and over time are
captured by the intercept. Individual intercepts are included to
“control” for these firm specific differences.
yit  1i  2 x2it  3 x3it  eit
Principles of Econometrics, 3rd Edition
(15.11)
Slide 15-23
1 i  1
1 i  2
1 i  3
D1i  
, D2i  
, D3i  
, etc.
0 otherwise
0 otherwise
0 otherwise
INVit  11D1i 12 D2i 
1,10 D10i 2V2it 3 K3it  eit
(15.12)
This specification is sometimes called the least squares dummy
variable model, or the fixed effects model.
Principles of Econometrics, 3rd Edition
Slide 15-24
Principles of Econometrics, 3rd Edition
Slide 15-25
H 0 : 11  12 
 1N
H1 : the 1i are not all equal
(15.13)
These N–1= 9 joint null hypotheses are tested using the usual F-test
statistic. In the restricted model all the intercept parameters are equal.
If we call their common value β1, then the restricted model is:
INVit  1  2Vit  3 Kit  eit
So this is just OLS, the pooled model
Principles of Econometrics, 3rd Edition
Slide 15-26
reg inv v k
Principles of Econometrics, 3rd Edition
Slide 15-27
SSER  SSEU  J

F
SSEU  NT  K 
1749128  522855  9


 48.99
522855  200  12 
We reject the null hypothesis that the intercept parameters for all
firms are equal. We conclude that there are differences in firm
intercepts, and that the data should not be pooled into a single model
with a common intercept parameter.
Principles of Econometrics, 3rd Edition
Slide 15-28
yit  1i  2 x2it  3 x3it  eit
t  1,
,T
(15.14)
1 T
 yit  1i  2 x2it  3 x3it  eit 

T t 1
1 T
1 T
1 T
1 T
yi   yit  1i  2  x2it  3  x3it   eit
T t 1
T t 1
T t 1
T t 1
(15.15)
 1i  2 x2i  3 x3i  ei
Principles of Econometrics, 3rd Edition
Slide 15-29
yit  1i  2 x2it  3 x3it  eit
 ( yi  1i  2 x2i  3 x3i  ei )
(15.16)
yit  yi  2 ( x2it  x2i )  3 ( x3it  x3i )  (eit  ei )
yit  2 xit  3 xit  eit
Principles of Econometrics, 3rd Edition
(15.17)
Slide 15-30
Principles of Econometrics, 3rd Edition
Slide 15-31
INV it  .1098V it  .3106K it
(se*) (.0116)
(.0169)
(15.18)
ˆ e2*  SSE  NT  2 
 NT  2  NT  N  2 
Principles of Econometrics, 3rd Edition
198 188  1.02625
Slide 15-32
Principles of Econometrics, 3rd Edition
Slide 15-33
yi  b1i  b2 x2i  b3 x3i
b1i  yi  b2 x2i  b3 x3i
Principles of Econometrics, 3rd Edition
i  1,
,N
(15.19)
Slide 15-34
ONE PROBLEM: Even with the trick of using the within estimator, we still
implicitly (even if no longer explicitly) include N-1 dummy variables in our
model (not N, since we remove the intercept), so we use up N-1 degrees of
freedom.
It might not be then the most efficient way to estimate the common slope
ANOTHER ONE. By using deviations from the means, the procedure wipes
out all the static variables, whose effects might be of interest
In order to overcome this problem, we can consider the random effects/or error
components model
Principles of Econometrics, 3rd Edition
Slide 15-35
1i  1  ui
Average intercept
(15.20)
E  ui   0, cov  ui , u j   0, var  ui   u2
yit  1i  2 x2it  3 x3it  eit
Randomness of the intercept
Usual error
  1  ui   2 x2it  3 x3it  eit
Principles of Econometrics, 3rd Edition
(15.21)
(15.22)
Slide 15-36
yit  1  2 x2it  3 x3it   eit  ui 
a composite error
(15.23)
 1  2 x2it  3 x3it  vit
vit  ui  eit
(15.24)
Because the random effects regression error has two components, one
for the individual and one for the regression, the random effects
model is often called an error components model.
Principles of Econometrics, 3rd Edition
Slide 15-37
E  vit   E  ui  eit   E  ui   E  eit   0  0  0
v has zero mean
v2  var  vit   var  ui  eit 
 var  ui   var  eit   2cov  ui , eit 
 u2  e2
Principles of Econometrics, 3rd Edition
(15.25)
v has constant variance
If there is no correlation between
the individual effects and the
error term
Slide 15-38
But now there are several correlations that can be considered.

The correlation between two individuals, i and j, at the same
point in time, t. The covariance for this case is given by
cov  vit , v jt   E (vit v jt )  E  ui  eit   u j  e jt  
 E  uiu j   E  ui e jt   E  eit u j   E  eit e jt 
 0000  0
Principles of Econometrics, 3rd Edition
Slide 15-39

The correlation between errors on the same individual (i) at
different points in time, t and s. The covariance for this case is
given by
cov  vit , vis   E (vit vis )  E  ui  eit  ui  eis  
 E  ui2   E  ui eis   E  eit ui   E  eit eis 
(15.26)
 u2  0  0  0  u2
Principles of Econometrics, 3rd Edition
Slide 15-40

The correlation between errors for different individuals in
different time periods. The covariance for this case is
cov  vit , v js   E (vit v js )  E  ui  eit   u j  e js  
 E  uiu j   E  ui e js   E  eit u j   E  eit e js 
 0000  0
Principles of Econometrics, 3rd Edition
Slide 15-41
cov(vit , vis )
u2
  corr(vit , vis ) 
 2
2
var(vit ) var(vis ) u  e
(15.27)
The errors are correlated over time for a given individual, but are otherwise
uncorrelated
This correlation does not dampen over time as in the AR1 model
Principles of Econometrics, 3rd Edition
Slide 15-42
yit  1 2 x2it 3 x3it  eit
eˆit  yit  b1  b2 x2it  b3 x3it
 N  T 2 
    eˆit 

NT  i 1  t 1 

LM 
 1
 N T
2 T  1 
2

ˆ
e

it
 i 1 t 1

(15.28)
This is xttest0 in Stata if H0 is not rejected you can use OLS
Principles of Econometrics, 3rd Edition
Slide 15-43
yit*  1x1*it 2 x2*it 3 x3*it  vit*
yit*  yit   yi , x1*it  1  ,
  1
x2*it  x2it   x2i , x3*it  x3it   x3i
e
T u2  e2
(15.29)
(15.30)
(15.31)
Is the transformation parameter
Principles of Econometrics, 3rd Edition
Slide 15-44
ˆ  1 
ˆ e
T ˆ  ˆ
2
u
2
e
 1
.1951
5 .1083  .0381
 .7437
Is the transformation parameter
Principles of Econometrics, 3rd Edition
Slide 15-45

Pooled OLS vs different intercepts: test (use a Chow
type, after FE or run RE and test if the variance of
the intercept component of the error is zero (xttest0))

You cannot pool onto OLS? Then…

FE vs RE: test (Hausman type)

Different slopes too perhaps? => use SURE or RCM
and test for equality of slopes across units




Note that there is within variation versus
between variation
The OLS is an unweighted average of the
between estimator and the within estimator
The RE is a weighted average of the between
estimator and the within estimator
The FE is also a weighted average of the
between estimator and the within estimator
with zero as the weight for the between part



The RE is a weighted average of the between
estimator and the within estimator
The FE is also a weighted average of the
between estimator and the within estimator
with zero as the weight for the between part
So now you see where the extra efficiency of
RE comes from!...


The RE uses information from both the crosssectional variation in the panel and the time
series variation, so it mixes LR and SR effects
The FE uses only information from the time
series variation, so it estimates SR* effects

With a panel, we can learn about dynamic
effects from a short panel, while we need a
long time series on a single cross-sectional
unit, to learn about dynamics from a time
series data set
If the random error vit  ui  eit is correlated with any of the right-hand side
explanatory variables in a random effects model then the least squares and
GLS estimators of the parameters are biased and inconsistent.
This bias creeps in through the between variation, of course, so the FE model
will avoid it
Principles of Econometrics, 3rd Edition
Slide 15-51
yit  1 2 x2it 3 x3it  (ui  eit )
1 T
1 T
1 T
1 T
1 T
yi   yit  1  2  x2it  3  x3it   ui   eit
T t 1
T t 1
T t 1
T t 1
T t 1
(15.32)
(15.33)
 1  2 x2i  3 x3i  ui  ei
Principles of Econometrics, 3rd Edition
Slide 15-52
yit  1  2 x2it  3 x3it  ui  eit
 ( yi  1  2 x2i  3 x3i  ui  ei )
(15.34)
yit  yi  2 ( x2it  x2i )  3 ( x3it  x3i )  (eit  ei )
Principles of Econometrics, 3rd Edition
Slide 15-53
t
bFE ,k  bRE ,k
12
 var  b   var  b  
FE ,k
RE ,k 





bFE ,k  bRE ,k
se  b   se  b  
FE ,k
RE ,k


2 12
2

(15.35)

We expect to find var bFE ,k  var bRE ,k  0.
var  bFE ,k  bRE ,k   var  bFE ,k   var  bRE ,k   2cov  bFE ,k , bRE ,k 
 var  bFE ,k   var  bRE ,k 




because Hausman proved that cov bFE ,k , bRE ,k  var bRE ,k .
Principles of Econometrics, 3rd Edition
Slide 15-54
The test statistic to the coefficient of SOUTH is:
t
bFE ,k  bRE ,k
se  b 2  se  b  
FE ,k
RE ,k


2 12

.0163  (.0818)
2 12
.03612  .0224  


 2.3137
Using the standard 5% large sample critical value of 1.96, we reject
the hypothesis that the estimators yield identical results. Our
conclusion is that the random effects estimator is inconsistent, and we
should use the fixed effects estimator, or we should attempt to
improve the model specification.
Principles of Econometrics, 3rd Edition
Slide 15-55
The Hausman test assumes that the RE estimator used in the comparison is fully
efficient, which requires that the unobserved effect and the idiosyncratic error
are both i.i.d. (Cameron & Trivedi MMA page 719)

often not the case => the hausman command yields incorrect statistic
Example: If the error terms are cluster, (e.g. due to autocorrelation across time for an
individual, then the RE estimator is not efficient)
Solutions:
 do a panel bootstrap of the Hausman test
 or use the Wooldridge (2002) robust version of Hausman test.
Principles of Econometrics, 3rd Edition
Slide 15-56

Test for gamma =0 in:

To run in Stata, generate the RE differences
and the mean differences
Principles of Econometrics, 3rd Edition

To run in Stata, generate the RE differences
and the mean differences manually

See an example here: pages 267-268 of
Cameron&Trivedi’s MUS book
Principles of Econometrics, 3rd Edition
If the random error vit  ui  eit is correlated with any of the righthand side explanatory variables in a random effects model then the
least squares and GLS estimators of the parameters are biased and
inconsistent.
Then we would have to use the FE model
But with FE we lose the static variables?
Solutions? HT, AM, BMS, instrumental variables models could help
Principles of Econometrics, 3rd Edition
Slide 15-59
Further issues
We can generalise the random effects idea and allow for different
slopes too: Random Coefficients Model
Again, the now it is the slope parameters that differ, but as in RE
model, they are drawn from a common distribution
The RCM in a way is to the RE model what the SURE model is to the
FE model
Principles of Econometrics, 3rd Edition
Slide 15-60
Further issues
Unit root tests and Cointegration in panels
Dynamics in panels
Principles of Econometrics, 3rd Edition
Slide 15-61
Further issues

Of course it is not necessary that one of the dimensions of the panel is time
as such

Example: i are students and t is for each quiz they take
Of course we could have a one-way effect model on the time dimension
instead

Or a two-way model

Or a three way model! But things get a bit more complicated there…
Principles of Econometrics, 3rd Edition
Slide 15-62
Further issues

Another way to have more fun with panel data is to consider
dependent variables that are not continuous

Logit, probit, count data can be considered

STATA has commands for these

Based on maximum likelihood and other estimation techniques we
have not yet considered
Principles of Econometrics, 3rd Edition
Slide 15-63
Further issues
Another extension is to consider mixed linear models (Cameron&Trivedi
MUS page 305)
Stata’s xtmixed fits linear mixed models. From Stata;s help:
Mixed models contain both fixed effects and random effects.
The fixed effects are analogous to standard regression coefficients and are
estimated directly
Principles of Econometrics, 3rd Edition
Slide 15-64
The random effects are not directly estimated but are
summarized according to their estimated variances and
covariances
 Although random effects are not directly estimated, you can
form best linear unbiased predictions (BLUPs) of them (and
standard errors) by using predict after xtmixed
 Random effects may take the form of either random intercepts
or random coefficients, and the grouping structure of the data
may consist of multiple levels of nested groups.
 Mixed models are also known as multilevel models and
hierarchical linear models
 Quite rare in the econometric literature

Undergraduate Econometrics, 3rd Edition
Principles of
Econometrics, 3rd Edition

Some particular specifications of the mixed linear
models result in more standard models

OLS, RE are special cases of mixed linear models

Another one is known as the Random Coefficients
Model

RCM also allows groupwise heteroskedasticity
rather than imposing homoskedasticity like its
mixed linear model equivalent
Undergraduate Econometrics, 3rd Edition
Principles of
Econometrics, 3rd Edition
Example in Cameron & Trivedi MUS page 310
Undergraduate Econometrics, 3rd Edition
Principles of
Econometrics, 3rd Edition
Data (available through Cameron & Trivedi’s MUS
textbook ancillary files) :
 mus08psidextract.dta
 (PSID wage data 1976-82 from Baltagi and KhantiAkom (1990))



I cut for you the first 994 observations
mus08psidextract994
Undergraduate Econometrics, 3rd Edition
Principles of
Econometrics, 3rd Edition
. xtrc lwage exp wks, i(id)
Random-coefficients regression
Group variable: id
Number of obs
Number of groups
=
=
994
142
Obs per group: min =
avg =
max =
7
7.0
7
Wald chi2(2)
Prob > chi2
lwage
Coef.
exp
wks
_cons
.0973225
.0032176
4.525957
Test of parameter constancy:
Std. Err.
.0041396
.0050025
.2825222
z
23.51
0.64
16.02
P>|z|
0.000
0.520
0.000
chi2(423) = 68970.73
DO we have a name for this test? Econometrics,
3rd Edition
Principles of
Econometrics, 3rd Edition
=
=
553.05
0.0000
[95% Conf. Interval]
.0892091
-.0065872
3.972224
.1054359
.0130223
5.07969
Prob > chi2 = 0.0000
Further issues

You can understand the use of the FE model as a solution to omitted
variable bias

If the unmeasured variables left in the error model are not correlated
with the ones in the model, we would not have a bias in OLS, so we
can safely use RE

If the unmeasured variables left in the error model are correlated with
the ones in the model, we would have a bias in OLS, so we cannot
use RE, we should not leave them out and we should use FE, which
bundles them together in each cross-sectional dummy
Principles of Econometrics, 3rd Edition
Slide 15-70
Further issues

Another criterion to choose between FE and RE

If the panel includes all the relevant cross-sectional units, use FE, if
only a random sample from a population, RE is more appropriate (as
long as it is valid)
Principles of Econometrics, 3rd Edition
Slide 15-71
Readings
Wooldridge’s book on panel data
Baltagi’s book on panel data
Greene’s coverage is also good
Principles of Econometrics, 3rd Edition
Slide 15-72














Balanced panel
Breusch-Pagan test
Cluster corrected standard errors
Contemporaneous correlation
Endogeneity
Error components model
Fixed effects estimator
Fixed effects model
Hausman test
Heterogeneity
Least squares dummy variable
model
LM test
Panel corrected standard errors
Pooled panel data regression
Principles of Econometrics, 3rd Edition





Pooled regression
Random effects estimator
Random effects model
Seemingly unrelated regressions
Unbalanced panel
Slide 15-73
Principles of Econometrics, 3rd Edition
Slide 15-74
yit  1 2 x2it 3 x3it  (ui  eit )
yit  yi  2 ( x2it  x2i )  3 ( x3it  x3i )  (eit  ei )
ˆ e2 
Principles of Econometrics, 3rd Edition
SSEDV
NT  N  K slopes
(15A.1)
(15A.2)
(15A.3)
Slide 15-75
yi  1 2 x2i 3 x3i  ui  ei
i  1, , N
T

var  ui  ei   var  ui   var  ei   var  ui   var   eit T 
 t 1

1
T e2
T 
2
2
 u  2 var   eit   u  2
T
T
 t 1 
(15A.4)
(15A.5)
e2
 
T
2
u
Principles of Econometrics, 3rd Edition
Slide 15-76
2

SSEBE
2
e
u 

T
N  K BE
2
2
ˆ


SSEBE
SSEDV
2
2
e
e
ˆ u  u 



T T N  K BE T  NT  N  K slopes 
Principles of Econometrics, 3rd Edition
(15A.6)
(15A.7)
Slide 15-77
Download