Uploaded by Awolt

Econometrics lecture 6

advertisement
Lecture 6
Panel data: following the same population over time.
Different from regular repeated cross-sectional data.
Year dummies
Different intercept for students this year and students next year.
Makes intercept dependent on time.
Deflating monetary values to take out inflation effects.
Panel data
Dimensions: N individuals observed over T periods
T: long/short panel (relative in terms of data, e.g. days can be long when unit of measurement is
seconds)
N: narrow/wide panel
Microeconomic panels typically wide and short
Macroeconomic panels typically narrow and long
Unbalanced panel: number of time series observations differs across individuals (e.g. households)
Balanced panel: Same number of time series for each individual (e.g. countries)
Both can be used for estimations, but determine whether dropout may involve self-selection
Why use panel data?
Account for endogeneity bias: account for unobserved (time-constant) individual heterogeneity.
Study dynamic factors. E.g. health in time after hospital visit. Or whether hospital visit gets people
back on previous health trajectory.
xtset ID YEAR defines panel with ID=N and YEAR=T
Pooled model
yit = B0 + B1X1it + B2X2it + uit
Subscript I denotes the ith individual and t denotes the tth time period
Indices I and t imply total observations i * t
Pooled model: why it is not a good idea to simply ignore the panel dimension and perform OLS?
In OLS we assume random sample => assume error between individuals are uncorrelated
However, when same individuals are observed over time, values will often depend a lot on previous
values
Cluster-robust standard errors
Allow some correlation over time
Consequences of autocorrelation and hetereskedasticity inherent in panel data. Least-squares
estimators are still consistent but standard errors incorrect (typically too small)
Use panel-robust standard errors (=cluster-robust standard errors)
Stata: add option vce(cluster id) after regression. Takes care of potential serial correlation at the IDlevel
Depending on data structure, you may also want to cluster on city or region
Clustering on a larger level automatically also allows for clusters at lower levels. E.g. clustering at
region also allows clustering at individual.
When clustering you increase standard error. Standard errors thus become more conservative. So
there is trade-off. Choose cluster at level where policy researched applies.
Having panel data allows to address endogeneity issues
> individual fixed effects: constant over time but varying over individuals
> time-fixed effects: vary over time but constant for individuals
Fixed-effect model
Assume data for t 1,2 and dummy t2 which is 1 for t = 2
Yit = B0 + d0t2t + B1xit + ai + uit
Differences in differences
Estimating effect of natural experiment.
E.g. reg health hospital_visists
How to assess causal affect of going to hospital on healthy yi for individual i?
Compare what happened when he went and what happened if he hadn’t gone.
Mimic potential outcome by taking people who did and who didn’t go to the hospital.
Measure treatment and control group before and after reform.
Then look not at same individual over time since we don’t know their other state. Also cannot
compare two groups after treatment because there might very well be selection.
But we can compare differences in health between these groups. E.g. treatment group increased
health by e, control group increased health by f. Then treatment effect: e-f
Assumption: in absence of going to the hospital, this group would have develop the same as the
control group. (parallel trends)
Download