• Chapter 10: All except (10.6), (10.10).

advertisement
PANEL DATA
(Ch. 10)
The recommended exercise questions from the textbook:
•
[1]
Chapter 10: All except (10.6), (10.10).
What are panel data?
• Panel data consists of the observations on the same n entities at
two or more time periods T. If the data set contains observations
on the variables X and Y, then the data are denoted
( X it , Yit ), i = 1,..., n and t = 1,..., T ,
where the first subscript, i, refers to the entity being observed, and
the second subscript, t, refers to the date at which it is observed.
• Balanced panel Vs. unbalanced panel.
• Balanced panel:
Variables are observed for each entity and
each time period.
• Unbalanced panel: Some missing data for at least one time
period.
• We consider the analysis of balanced panel. But extension to
unbalanced is straightforward.
Panel-1
[2]
Revisiting Omitted Variables Biases
• Issue:
• Do alcohol taxes help decrease traffic deaths?
• Data: fatality.wf1
• 48 U.S. states (excluding Alaska and Hawaii): N = 48.
• 1982 -1988: T =7.
• fatality rate = # of traffic accident deaths per 10,000 people.
beertax = tax per a case of beer ($).
• Estimation results for the 1982 data:
n = 2.01 + 0.15BeerTax
FatalityRate
(0.15) (0.13)
• Estimation results for the 1988 data:
n = 1.86 + 0.44BeerTax
FatalityRate
(0.11) (0.13)
Panel-2
Panel-3
• What is going on here?
• Consider a simple multiple regression model (for a given time t):
Yit = β0 + β1Xit + β2Zi + uit, i = 1, ... , N,
where Zi is a time-invariant regressor.
1 • What do β1 and β2 measure?
β1 measures the partial effect of Xit on Yit with Zi held constant.
Similarly, β2 measures the partial effect of Zi on Yi with Xit held
constant.
• If you estimate Yit = α0 + α1Xit + errorit instead?
•
αˆ1 → p β1 + β 2
cov( X it , Z i )
var( X it )
• Each state would have a different level of preference for alcohol
(say, Zi = Pal).
• Pal (Z) and Beertax (X) could be positively related: cov( X it , Zi ) >0.
• Pal (Z) would have a positive partial effect on FatalityRate (β2 > 0).
• Thus, α̂1 could be positive even if the true β1 is negative.
• How could we control Pal using panel data?
Panel-4
[3]
Panel Data with Two Time Periods
• Two equations for 1982 and 1988:
FatalityRatei,1988 = β0 + β1BeerTaxi,1988 + β2Zi + ui,1988.
FatalityRatei,1982 = β0 + β1BeerTaxi,1982 + β2Zi + ui,1982.
→
FatalityRatei,1988 – Fatalityi,1982
= β1(BeerTaxi,1988 –BeerTaxi,1982) + (ui,1988-ui,1982).
• No Zi in (1)! OLS on (1) will yield a consistent estimator of β1.
• Actual estimation results for (1):
n
Fatality
1988 − Fatality1982
= -0.072 – 1.04(BeerTax1988 – BeerTax1982)
(0.065) (0.36)
Panel-5
(1)
• Comments on the before-and-after estimation results.
• As real beer tax increases by $1 per case, the traffic fatality rate
falls by 1.04 deaths per 10,000 people.
→ This is a big effect, because mean traffic fatality rate is
approximately two.
• This before-and-after approach works well if T = 2. What should
we do if T > 2?
Panel-6
[4]
Fixed Effects Regression
(A) A simple regression model:
Yit = β0 + β1Xit + β2Zi + uit, i = 1, ... , N, t = 1, ... , T.
(1)
• Set αi = β0 + β2Zi. Then, we have
Yit = β1Xit + αi + uit,
(2)
which is called the “fixed effects regression model.
• For the i’th cross-sectional entity, the regression line is (2). The
slope coefficient β1 is the same for all i, but the intercept terms αi
are different across different i (but constant over time).
• Set:
Yit = β0 + β1Xit + γ2D2i + γ3D3i + ... + γnDni + uit,
(3)
where i = 1, ... , n, t = 1, ..., T (nT observations),
⎧1 if i is the 2nd entity;
D 2i = ⎨
⎩0 otherwise,
and other dummy variables D3, ..., Dn are similarly defined.
• In (3), α1 = β0, α2 = β0 + γ2, ... , αn = β0 + γn.
• The slope coefficient β1 and n other parameters (β0, γ2, ..., γn) can
be estimated by OLS on model (3).
Panel-7
•
“Entity-demeaned” OLS algorithm
•
Yit = β1Xit + αi + uit
Yi = β1 X i + αi + ui , where Yi =
1 T
Σt =1Yit .
T
------------------------------------
(Y
it
− Yi ) = β1 ( X it − X i ) + ( uit − ui ) .
(4)
• OLS estimator of β1 from (4) = OLS estimator of β1 from (3).
• Least Square Assumptions for the fixed effects model:
(FEA.1) E (uit | X i1 , X i 2 ,..., X iT , αi ) = 0 .
(FEA.2) The data, ( X i1 ,..., X iT , Yi1 ,..., YiT ) , i =1, ..., n, are random
sample.
(FEA.3) ( X it , αi ) have nonzero finite fourth moments: Large
outliers are unlikely.
(FEA.4) There is no perfect multicollinearity.
(FEA.5) No autocorrelation: cov(uit , uis | X i1 ,..., X iT , αi ) = 0 for all
t ≠ s.
For multiple regressions, Xit should be replaced by full list of X1,it,
…, Xk,it.
• What happens if (FEA.5) is violated?
Panel-8
(B) Extension to multiple X’s.
• The fixed effects regression model is
Yit = β1X1,it + ... + βkXk,it + αi + uit,
(5)
where i = 1, ... , n, and t = 1, ... , T.
• Equivalently, the fixed effects model can be written as
Yit = β0 + β1X1,it + ... + βkXk,it + γ2D2i + ... + γnDni + uit.
(6)
• “Entity-demeaned” algorithm
(Y
it
− Yi ) = β1 ( X 1,it − X 1,i ) + ... + β k ( X k ,it − X k ,i ) + ( uit − ui ) .
(7)
• OLS estimators of β1, ... , βk from (7) = OLS estimators of β1, ... ,
βk from (6).
(C) Application to Traffic Deaths.
• Fixed effects regression results:
n = -0.66BeerTax + StateFixedEffects.
FatalityRate
(0.20)
Panel-9
[5]
Time and Entity Fixed Effects Model
(1)
Motivation.
• Return to our FatalityRate example:
Yit = β0 + β1Xit + β2Zi + β3St + uit,
where, Yit = FatalityRate; Xit = BeerTax;
Zi = time-invariant preferences for alcohol or driving of the
people in State i;
St = Time specific effects (common to all states) such as
overall mobile safety improvements.
⎧1 if t is the first time period ;
• Let B1t = ⎨
⎩0, otherwise.
Define dummy variables B2t, ... , BTt similarly.
(2)
Time and Entity Fixed Effects Model:
Yit = β0 + β1X1,it + ... + βkXk,it + γ2D2i + ... + γnDni
+ δ2B2t + ... δTBTt + uit.
• Too many regressors. But can get reasonably accurate estimates
of β1, ... , βk. But the estimates of γ2, ... , γn and δ2, ... , δT are
inaccurate.
(3)
Application to traffic death
n = -0.64Beertax + StateFixedEffects
FatalityRate
(0.25)
+ TimeFixedEffects.
Panel-10
[6]
Drunk Driving Laws and Traffic Death
• Would driving laws and economic conditions matter?
Panel-11
• Drinking or drunken driving law do not matter very much.
• Economic factors are important.
• (4) is the base model.
• Average tax = $0.5/case,
and average fatality rate = 2 per 10,000 people.
• As tax increases by $0.5, fatality rate drops 0.45×0.5 = 0.225 (per
10,000).
→ But this result is somewhat imprecise: The confidence interval for
the effect of BeerTax at 95% of confidence level is:
−0.45 ± 1.96 × 0.22 → (-0.88, -0.02),
which is quite wide.
Panel-12
[7]
Eviews Exercise
(1)
Exercise with an artificial panel data set named “artificial_panel.xls.”
There are four variables in the excel file, “country”, “year”, “y”, and “x”. Each
variable has 11 observations from the 3rd row to the 14th row. The data are
artificial numbers for three countries, US, Japan and Korea. Notice that the
variable “country” is alphabetic, not numeric.
STEP 1:
Open artificial_panel.xls using Excel. Then, using your mouse, block
the data and copy them.
STEP 2:
Open Eviews. Then, type the following on the Eviews window (the
narrow white window below the File, Edit, Object buttons):
create u 12 (enter)
Then, a workfile window will pop up.
Panel-13
Type the followings on the Eviews window:
alpha country (enter)
data year y x (enter)
The command “alpha” is used to create alphabetic variables, while
“data” is for numeric variables.
Then, a spreadsheet will pop up.
Panel-14
Close the window by clicking on X on the North-East corner of the
window. Eviews will ask you whether you want to delete Untitled
Group. Click on the Yes button.
Panel-15
STEP 3:
On the workfile, click on the show buttom. Then, a SHOW window
will pop up. Type on the window:
country year y x
Panel-16
Click on OK. Then, a spreadsheet will pop up.
Panel-17
Click on Edit+/- buttom and locate your cursor on the 1-country cell. And push
the right button on your mouse.
Panel-18
Then, you will see that the data from the excel file are pasted to the
spreadsheet.
Panel-19
Close the spreadsheet by clicking on X on the North-East corner.
Eviews will ask you whether you want to delete Untitled Group.
Click on the Yes button.
STET 4:
On the workfile, push the save buttom. Determine the drive and file
folder where you want to save the file. Choose the file name
“artificial_panel.wf1”.
Panel-20
Click on the save button. Then, a “Workfile Save” window will pop
up. Just click on the ok button.
Panel-21
Then, you will be back to the workfile.
Panel-22
STEP 5:
On the workfile, push the Proc button. Choose Structure/Resize
Current Page…
Panel-23
Then you will have the Workfile Structure window. Choose Dated
Panel. Then, you will have the following screen.
Panel-24
Type 2001 for Start date, 2004 for End date, country for Crosssection ID series, and year for Data series. Then, click on OK.
Panel-25
Then, you will be back to the workfile. Save it!!!
STEP 6:
Push the objects/new object... button. Choose Equation and choose
art_pan as the name of the object. Then, an Equation Estimation
window will pop up. Type “y x” on the Equation specification box.
Panel-26
And click on Panel Options.
Panel-27
Choose “Fixed” for Cross-section, “Fixed” for Period, and “White
(diagonal) for Coef covariance method.
By choosing “Fixed” for Cross-section, you are doing regression with
dummy variables for individual entities. By choosing “Fixed” for
Period, you are adding time dummy variables into regression.
Panel-28
STEP 7:
Choose view/Fixed/Random Effects/Cross-section Effects.
Then you will have:
Panel-29
Choose view/Fixed/Random Effects/Period Effects.
Panel-30
Choose view/Fixed/Random Effects Testing/Redundant Fixed Effects.
Panel-31
Panel-32
I found that the F and χ2 statistics for the individual dummy variables and the
time dummy variables are computed assuming the error terms in the
regression models are homoskedastic over i and t. So, the results are not
reliable if the error terms are in fact heteroskedastic.
If you would like to test whether time effects are statistically significant,
I would like to suggest you to estimate your model choosing None for Period
but including time-dummy variables as time dummy variables.
Panel-33
(2) Exercise with fatality.wf1.
----------------------------------------------------------------------------------variable name
variable label
---------------------------------------------------------------------------------state
State ID (FIPS) Code
year
Year
spircons
Spirits Consumption
unrate
Unemployment Rate
perinc
Per Capita Personal Income
emppop
Employment/Population Ratio
beertax
Tax on Case of Beer
sobapt
% Southern Baptist
mormon
% Mormon
mlda
Minimum Legal Drinking Age
dry
% Residing in Dry Counties
yngdrv
% of Drivers Aged 15-24
vmiles
Ave. Mile per Driver
vmilespd
Ave. Mile per 1,000 Driver
breath
Prelim. Breath Test Law
jaild
Mandatory Jail Sentence
comserd
Mandatory Community Service
jailcom
jaild + comserd
allmort
# of Vehicle Fatalities (#VF)
mrall
Vehicle Fatality Rate (VFR) = #VF/Population
vfrall
10,000*mrall = VFR per 10,000 people
allnite
# of Night-time VF (#NVF)
mralln
Night-time VFR (NVFR)
allsvn
# of Single VF (#SVF)
a1517
#NVF, 15-17 year olds
mra1517n
NVFR, 15-17 year olds
a1829
#VF, 18-20 year olds
a1820n
#NVF, 18-20 year olds
mra1820
VFR, 18-20 year olds
mra1820n
NVFR, 18-20 year olds
a2124
#VF, 21-24 year olds
mra2124
VFR, 21-24 year olds
a2124n
#NVF, 21-24 year olds
mra2124n
NVFR, 21-24 year olds
aidall
# of alcohol-involved VF
Panel-34
da18
Dummy variable for drinking age = 18
da19
Dummy variable for drinking age = 19
da20
Dummy variable for drinking age = 20
lincperc
Log of per capita real income
mraidall
Alcohol-Involved VFR
pop
Population
pop1517
Population, 15-17 year olds
pop1820
Population, 18-20 year olds
pop2124
Population, 21-24 year olds
miles
total vehicle miles (millions)
unus
U.S. unemployment rate
epopus
U.S. Emp/Pop Ratio
gspch
GSP Rate of Change
Dum1982
Dum1983
Dum1984
:
DUM1988
------------------------------------------------------------------------------------
Panel-35
• Estimation of the specification (4) on Table 10.1 in p. 368.
Dependent Variable: VFRALL
Sample: 1982 1988
Cross-sections included: 48
Total panel (balanced) observations: 336
White diagonal standard errors & covariance (d.f. corrected)
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
BEERTAX
DA18
DA19
DA20
JAILD
COMSERD
VMILESPD
LINCPERC
UNRATE
DUM1982
DUM1983
DUM1984
DUM1985
DUM1986
DUM1987
-2.327171
-0.450272
0.027509
-0.019096
0.030875
0.012644
0.034135
0.008226
1.814889
-0.063043
0.533926
0.435841
0.246723
0.155325
0.189843
0.087532
1.316419
0.222005
0.065473
0.039510
0.045689
0.031940
0.114820
0.008368
0.472220
0.011616
0.075931
0.070418
0.050392
0.043688
0.040808
0.032452
-1.767804
-2.028203
0.420158
-0.483315
0.675767
0.395866
0.297289
0.983073
3.843312
-5.427345
7.031706
6.189300
4.896067
3.555327
4.652090
2.697246
0.0782
0.0435
0.6747
0.6293
0.4998
0.6925
0.7665
0.3264
0.0002
0.0000
0.0000
0.0000
0.0000
0.0004
0.0000
0.0074
Effects Specification
Cross-section fixed (dummy variables)
R-squared
Adjusted R-squared
Log likelihood
Durbin-Watson stat
0.939540
0.925809
183.8646
1.733929
Mean dependent var
S.D. dependent var
F-statistic
Prob(F-statistic)
Panel-36
2.040444
0.570194
68.42532
0.000000
• Testing significance of the individual and time dummy variables:
[Estimation choosing “Fixed” for period and not using dummy variables as
regressor.]
Redundant Fixed Effects Tests
Equation: MIN
Test cross-section and period fixed effects
Effects Test
Cross-section F
Cross-section Chi-square
Period F
Period Chi-square
Cross-Section/Period F
Cross-Section/Period Chi-square
Statistic
44.772106
727.186063
19.685127
120.798386
40.398468
732.351587
Panel-37
d.f.
Prob.
(47,273)
47
(6,273)
6
(53,273)
53
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
• Testing significance of the time dummy variables:
[Estimation choosing “None” for period and using dummy variables as
regressor.]
Wald Test:
Equation: MIN
Test Statistic
F-statistic
Chi-square
Value
11.46715
68.80287
Panel-38
df
(6, 273)
6
Probability
0.0000
0.0000
Comments on (FEA.5):
• What if Assumption #5 fails: so corr(uit,uis|Xit,Xis,αi) ≠0?
• OLS panel data estimators of β1 are unbiased, consistent.
• The OLS standard errors will be wrong.
• Use “heteroskedasticity and autocorrelation-consistent standard
errors” (clustered standard errors).
• The clustered SE formula is NOT the usual (hetero-robust) SE
formula! [Appendix 10.2 (pp. 379 – 381)].
• The clustered SE might not be very accurate if N is small.
• Eviews can compute these!
• In Eviews, choose “White period” instead of “White (diagonal)”.
Panel-39
• Estimation of the specification (7) on Table 10.1 in p. 368.
Dependent Variable: VFRALL
Sample: 1982 1988
Cross-sections included: 48
Total panel (balanced) observations: 336
White period standard errors & covariance (d.f. corrected)
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
BEERTAX
DA18
DA19
DA20
JAILD
COMSERD
VMILESPD
LINCPERC
UNRATE
DUM1982
DUM1983
DUM1984
DUM1985
DUM1986
DUM1987
-2.327171
-0.450272
0.027509
-0.019096
0.030875
0.012644
0.034135
0.008226
1.814889
-0.063043
0.533926
0.435841
0.246723
0.155325
0.189843
0.087532
1.915400
0.319805
0.075267
0.053288
0.054076
0.017699
0.142797
0.007355
0.683535
0.013984
0.098541
0.091540
0.064103
0.054832
0.042774
0.032445
-1.214979
-1.407961
0.365483
-0.358351
0.570957
0.714386
0.239043
1.118432
2.655150
-4.508168
5.418291
4.761205
3.848852
2.832774
4.438265
2.697841
0.2254
0.1603
0.7150
0.7204
0.5685
0.4756
0.8113
0.2644
0.0084
0.0000
0.0000
0.0000
0.0001
0.0050
0.0000
0.0074
Effects Specification
Cross-section fixed (dummy variables)
R-squared
Adjusted R-squared
Durbin-Watson stat
0.939540
0.925809
1.733929
Mean dependent var
S.D. dependent var
Prob(F-statistic)
Panel-40
2.040444
0.570194
0.000000
• Average tax = $0.5/case,
and average fatality rate = 2 per 10,000 people.
• As tax increases by $0.5, fatality rate drops 0.45×0.5 = 0.225 (per
10,000).
→ The confidence interval for the effect of BeerTax at 95% of
confidence level is:
−0.45 ± 1.96 × 0.32 → (-1.08, 0.18),
which is wider than (-0.88, -0.02).
Panel-41
Panel-42
Panel-43
Download