Difference-in

advertisement
Difference-in-Difference
Development
Workshop
Typical problem in proving
causal effects



Using differences to estimate causal effects in
experimental data (treatment+control groups)
Wish: ‘treatment’ and ‘control’ group can be assumed
to be similar in every way except receipt of treatment
This may be very difficult to do
A Weaker Assumption is..



In absence of treatment, difference between ‘treatment’
and ‘control’ group is constant over time
With this assumption can use observations on
treatment and control group pre- and post-treatment to
estimate causal effect
Idea
–
–
–
Difference pre-treatment is ‘normal’ difference
Difference post-treatment is ‘normal’ difference + causal effect
Difference-in-difference is causal effect
Graphically…
Treatment
y
A
C
B
Control
Pre-
Post-
Time
What is D-in-D estimate?






Standard differences estimator is AB
But ‘normal’ difference estimated as CB
Hence D-in-D estimate is AC
Note: assumes trends in outcome variables the same
for treatment and control groups
This is not testable
Two periods (before and after) crucial
The Grand Experiment (Snow)



Water supplied to households by competing private
companies
Sometimes different companies supplied households in
same street
In south London two main companies:
–
–
Lambeth Company (water supply from Thames Ditton, 22
miles upstream)
Southwark and Vauxhall Company (water supply from
Thames)
In 1853/54 cholera outbreak

Death Rates per 10000 people by water company
–
–


Might be water but perhaps other factors
Snow compared death rates in 1849 epidemic
–
–

Lambeth
10
Southwark and Vauxhall 150
Lambeth
150
Southwark and Vauxhall 125
In 1852 Lambeth Company had changed supply from
Hungerford Bridge
What would be good estimate of
effect of clean water?
1849
1853/54
Difference
Lambeth
150
10
-140
Vauxhall and
Southwark
125
150
25
Difference
-25
140
-165
Card and Krueger (1994)





Basic microeconomic theory of the firm: factor demand curves
slope downwards.
Hence, if minimum wages are binding, we would expect
employment to fall if minimum wage is raised.
Natural experiment: New Jersey raising its minimum wage from
$4.25 to $5.05 on 1 April 1992 while the minimum wage in
neighbouring Pennsylvania remained unchanged.
Data: wages and employment in 65 fast-food restaurants in
Pennsylvania and 284 in New Jersey in Feb/March 1992 (i.e.
before the rise in the NJ minimum wage) and in Nov/Dec 1992
(i.e. after the rise).
Difference-in-difference design to investigate the impact of
minimum wages on employment.
What data we have?

698 observations
–
–
–
–
–
–
Sheet: an identifier for each restaurant (each has
two observations, pre- and post-)
NJ: dummy for whether a NJ restaurant
After: dummy for whether post- observation
Njafter: nj*after
Fte: full-time equivalent employment
Dfte: change in full-time equivalent employment
Tabulate command

Tabulate in STATA:
–
–
–
tabulate var (or tab var) – just a simple table
tab var, g(newvar) – generating a new variable
tab var, su(othervar) – summarising some other
variable
Let’s get our first DinD estimator

tabulate nj after, su(fte) means
Before
After
Diff
PA
20.3
18.3
-2.0
NJ
17.3
17.5
+0.2
Diff
+3.0
+0.8
??
Going from means to statistics 

reg dfte nj
Source
SS
df
MS
Model
Residual
286.841779
25485.8728
1
347
286.841779
73.4463192
Total
25772.7145
348
74.0595245
dfte
Coef.
nj
_cons
2.328724
-2.046154
Std. Err.
1.178371
1.062988
t
1.98
-1.92
Number of obs
F( 1,
347)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.049
0.055
=
=
=
=
=
=
349
3.91
0.0489
0.0111
0.0083
8.5701
[95% Conf. Interval]
.0110768
-4.136864
4.646372
.0445564
… and with robust standard errors


reg dfte nj
reg dfte nj, robust
OLS
Robust OLS
Coeff
2.329
2.329
SE
1.17
1.47
P-value
0.049
0.114
An alternative specification …

reg fte nj after njafter, robust
Linear regression
Number of obs
F( 3,
694)
Prob > F
R-squared
Root MSE
fte
Coef.
nj
after
njafter
_cons
-2.998944
-2.046154
2.328724
20.3
Robust
Std. Err.
1.591452
1.788875
1.930761
1.501537
t
-1.88
-1.14
1.21
13.52
P>|t|
0.060
0.253
0.228
0.000
=
=
=
=
=
698
1.32
0.2682
0.0089
8.9641
[95% Conf. Interval]
-6.123581
-5.55841
-1.46211
17.3519
.1256939
1.466103
6.119558
23.2481
Alternative specifications…




reg fte nj after njafter, cl(sheet)
xtreg fte nj after njafter, fe i(sheet)
Any key differences?
Should there be any?
Suppose we’d like to observe many
estimations


STATA commands for results-sets
Guy named Roger Newson
–
–
–
estimates store
outreg (works mostly with regressions)
parmest/parmby
Summary




A very useful and widespread approach
Validity does depend on assumption that
trends would have been the same in absence
of treatment
Can use other periods
to see if this
assumption is plausible or not
Uses 2 observations on same individual – most
rudimentary form of panel data
Download