Solutions

advertisement
102B - Introduction to Econometrics – Winter Term 2012/13
Paolo Pin
ppin@stanford.edu
Stanford, February 21st 2013
Problem Set 5
This problem set is based on lectures 12 and 13 (February 19th and 21st).
It must be turned in to the Economics Academic Office by 4 P.M. on Tuesday March 5th.
Late homework will be assigned a grade of 0 and the lowest grade will be dropped in computing grades. It
is entirely your responsibility to ensure that you complete the assignments and remember to turn them in
on time at the designated location. There will be no extensions for the problem sets. The only exception
to this rule is for death of a family member or illness requiring immediate attention of a physician. There
will be no exception for job interviews or other non-Stanford activities or for completed work that students
forget to turn in. Athletes on the road must still turn in the problem sets by the stated deadlines, although
may do so by fax. See the course management policies (http://economics.stanford.edu/undergraduate/
economics-common-syllabus) for more details on these issues.
1 - Exercise with Stata
In the coursework you find the dataset ‘fertil’. includes, for women in Botswana during 1988,
information on number of children, years of education, age, and religious and economic status variables (this dataset is taken from J. M. Wooldridge (2012) “Introductory Econometrics”).
The variables that we are interested for in this exercise are:
children: number of living children
educ: years of education
age: age in years
mnthborn: month woman born
frsthalf: =1 if mnthborn ≤ 6
1
electric: =1 if has electricity
tv: =1 if has tv
bicycle: =1 if has bicycle
(a) Estimate this model by OLS
children = β0 + β1 educ + β2 age + β3 age2 + u
and interpret the estimates. In particular, holding age fixed, what is the estimated effect
of another year of education on fertility? If 100 women receive another year of education,
how many fewer children are they expected to have?
. gen age2=age*age
. reg children educ age age2, robust
Linear regression
Number of obs
F( 3, 4357)
Prob > F
R-squared
Root MSE
=
4361
= 1922.00
= 0.0000
= 0.5687
= 1.4597
-----------------------------------------------------------------------------|
Robust
children |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------educ | -.0905755
.0060483
-14.98
0.000
-.1024332
-.0787178
age |
.3324486
.0192071
17.31
0.000
.2947929
.3701043
age2 | -.0026308
.000352
-7.47
0.000
-.0033209
-.0019408
_cons | -4.138307
.2436211
-16.99
0.000
-4.615928
-3.660685
------------------------------------------------------------------------------
One year more of education reduces expected children by .09. 100 women would have 9
less children if they all had one year more of education.
(b) F rsthalf is a dummy variable equal to one if the woman was born during the first six
months of the year. Assuming that f rsthalf is uncorrelated with the error term from
part (i), show that f rsthalf is a reasonable IV candidate for education
2
. reg educ frsthalf age age2, robust
Linear regression
Number of obs
F( 3, 4357)
Prob > F
R-squared
Root MSE
=
=
=
=
=
4361
201.72
0.0000
0.1077
3.711
-----------------------------------------------------------------------------|
Robust
educ |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------frsthalf | -.8522854
.1132665
-7.52
0.000
-1.074345
-.6302254
age | -.1079504
.0402228
-2.68
0.007
-.1868076
-.0290932
age2 | -.0005056
.0006802
-0.74
0.457
-.0018392
.000828
_cons |
9.692864
.5414317
17.90
0.000
8.631383
10.75435
------------------------------------------------------------------------------
f rsthalf is a big determinant of educ (even controlling for age), so it has a strong
correlation with it: it is relevant. The reason is probably that all women start school
in the same month of the year, after having reached a certain age (say the September
after they become 6), and most drop at a specific birthday (say the year in which they
become 10).
It is also safe to argue that it not correlated with the part of children which is not
explained by education.
(c) Estimate the model from part (i) by using f rsthalf as an IV for educ. Compare the
estimated effect of education with the OLS estimate from part (i).
. ivreg children age age2 (educ=frsthalf ) , robust
Instrumental variables (2SLS) regression
Number of obs
F( 3, 4357)
Prob > F
R-squared
Root MSE
=
4361
= 1838.43
= 0.0000
= 0.5502
= 1.4907
-----------------------------------------------------------------------------|
Robust
children |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
3
-------------+---------------------------------------------------------------educ | -.1714989
.0523859
-3.27
0.001
-.2742019
-.068796
age |
.3236052
.0202371
15.99
0.000
.2839302
.3632802
age2 | -.0026723
.0003524
-7.58
0.000
-.0033631
-.0019815
_cons | -3.387805
.5451939
-6.21
0.000
-4.456663
-2.318948
-----------------------------------------------------------------------------Instrumented: educ
Instruments:
age age2 frsthalf
------------------------------------------------------------------------------
Now, with the help of the instrument, it comes out that the effect of educ is almost
double as big.
(d) Add the binary variables electric, tv, and bicycle to the model and assume these are
exogenous. Estimate the equation by OLS and 2SLS and compare the estimated coefficients on educ.
. ivreg children age age2 electric tv bicycle
Instrumental variables (2SLS) regression
(educ=frsthalf ) , robust
Number of obs
F( 6, 4349)
Prob > F
R-squared
Root MSE
=
=
=
=
=
4356
939.15
0.0000
0.5577
1.4789
-----------------------------------------------------------------------------|
Robust
children |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------educ | -.1639814
.0643804
-2.55
0.011
-.2901999
-.037763
age |
.3281451
.0213264
15.39
0.000
.2863345
.3699556
age2 | -.0027222
.0003503
-7.77
0.000
-.0034089
-.0020354
electric | -.1065314
.1583542
-0.67
0.501
-.4169864
.2039236
tv |
-.002555
.204425
-0.01
0.990
-.4033322
.3982222
bicycle |
.3320724
.0506832
6.55
0.000
.2327074
.4314374
_cons | -3.591332
.639396
-5.62
0.000
-4.844874
-2.33779
-----------------------------------------------------------------------------Instrumented: educ
Instruments:
age age2 electric tv bicycle frsthalf
------------------------------------------------------------------------------
4
electric and tv have significative and huge effect: they have probably the causal interpretation that they reduce the time married couples spend in intimacy. bycicle is a control
variable for wealth and for the amount of time not spent at home.
2 - One theoretical exercise
Consider the simple regression model
y = β0 + β1 x + u
and let z be a binary instrumental variable for x. Show that the IV estimator β̂1 can be
written as
ȳ1 − ȳ0
β̂1 =
,
x̄1 − x̄0
where ȳ0 and ȳ0 are the sample averages of yi and xi over the part of the sample with zi = 0,
and where ȳ1 and ȳ1 are the sample averages of yi and xi over the part of the sample with
zi = 1. This estimator, known as a grouping estimator, was first suggested by Wald (1940).
We know that
β̂1T SLS
Pn
(zi − z̄)(yi − ȳ)
sZY
.
=
= Pni=1
sZX
i=1 (zi − z̄)(xi − x̄)
But then we can write
n
X
"
#
X
(zi − z̄)(yi − ȳ) =
i=1
"
#
X
(1 − z̄)(yi − ȳ) +
i:zi =1
(−z̄)(yi − ȳ)
i:zi =0
"
#
X
= ȳ1 − z̄ ȳ +
"
(−z̄)(yi − ȳ) +
i:zi =1
"
#
X
=
#
X
(−z̄)(yi − ȳ)
i:zi =0
"
(zi − z̄)yi +
i:zi =1
#
X
(zi − z̄)yi − ȳ
n
X
(zi − z̄)
i:zi =0
i=1
{z
|
"
=
#
X
"
#
X
zi yi −
i:zi =1
z̄yi +
i:zi =1
= ȳ1
1
− z̄ ȳ1
i:zi =1
X
i:zi =1
5
#
X
zi yi −
1
{z
=0
"
X
z̄yi
i:zi =0
i:zi =0
|
!
!
X
"
}
#
=0
}
!
− z̄ ȳ0 n −
X
i:zi =1
1
.
Now consider that z̄ =
1
/n, and so
i:zi =1
P
!
X
1
!
− z̄
i:zi =1
So
β̂1T SLS
X
1
i:zi =1
!
= z̄
n−
X
1
.
i:zi =1
P
(ȳ1 − ȳ0 )z̄ n − i:zi =1 1
sZY
ȳ − ȳ0
= 1
P
=
=
.
sZX
x̄1 − x̄0
(x̄1 − x̄0 )z̄ n − i:zi =1 1
3 - Exercises from the book
Do the following exercises from: Introduction to Econometrics by James H. Stock and Mark
W. Watson (Addison-Wesley, 3rd Edition):
• empirical exercises, requiring Stata: exercises E12.1 and E12.2;
E12.1
. gen logprice=log(price)
. gen logquantity=log(quantity )
. reg logquantity logprice ice seas1 seas2 seas3 seas4 seas5 seas6 seas7 seas8 seas9 seas10 seas11
Linear regression
Number of obs
F( 14,
313)
Prob > F
R-squared
Root MSE
=
=
=
=
=
328
11.77
0.0000
0.3126
.39727
-----------------------------------------------------------------------------|
Robust
logquantity |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------logprice | -.6388847
.0732804
-8.72
0.000
-.7830692
-.4947003
ice |
.4477537
.1349288
3.32
0.001
.1822717
.7132358
seas1 | -.1328219
.0957944
-1.39
0.167
-.3213042
.0556604
seas2 |
.0668882
.0907065
0.74
0.461
-.1115834
.2453599
seas3 |
.1114365
.0970148
1.15
0.252
-.0794472
.3023201
seas4 |
.1554219
.1324978
1.17
0.242
-.1052771
.416121
seas5 |
.1096585
.1276572
0.86
0.391
-.1415162
.3608333
6
seas6 |
.0468325
.1766425
0.27
0.791
-.3007243
.3943894
seas7 |
.1225526
.1998661
0.61
0.540
-.2706984
.5158036
seas8 | -.2350078
.1749897
-1.34
0.180
-.5793126
.109297
seas9 |
.0035607
.1723754
0.02
0.984
-.3356003
.3427217
seas10 |
.1692469
.1729309
0.98
0.328
-.1710071
.5095009
seas11 |
.2151845
.1728162
1.25
0.214
-.1248439
.5552128
seas12 |
.2196331
.1700043
1.29
0.197
-.1148625
.5541287
_cons |
8.861233
.177072
50.04
0.000
8.512831
9.209635
-----------------------------------------------------------------------------. ivregress 2sls
> ce(robust)
logquantity ice seas1 seas2 seas3 seas4 seas5 seas6 seas7 seas8 seas9 seas10 sea
Instrumental variables (2SLS) regression
Number of obs
Wald chi2(14)
Prob > chi2
R-squared
Root MSE
=
=
=
=
=
328
165.29
0.0000
0.2959
.39279
-----------------------------------------------------------------------------|
Robust
logquantity |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------logprice | -.8665865
.1307362
-6.63
0.000
-1.122825
-.6103483
ice |
.422934
.1315104
3.22
0.001
.1651784
.6806896
seas1 | -.1309732
.1005382
-1.30
0.193
-.3280245
.066078
seas2 |
.0909521
.0927421
0.98
0.327
-.090819
.2727233
seas3 |
.135872
.0980894
1.39
0.166
-.0563797
.3281237
seas4 |
.1525109
.1313612
1.16
0.246
-.1049523
.4099741
seas5 |
.0735618
.1271374
0.58
0.563
-.175623
.3227465
seas6 | -.0060642
.1721703
-0.04
0.972
-.3435118
.3313834
seas7 |
.0602324
.1964209
0.31
0.759
-.3247454
.4452102
seas8 | -.2935991
.1707606
-1.72
0.086
-.6282837
.0410855
seas9 | -.0583723
.1714096
-0.34
0.733
-.3943289
.2775844
seas10 |
.0858109
.1738156
0.49
0.622
-.2548614
.4264832
seas11 |
.1517912
.1716185
0.88
0.376
-.184575
.4881573
seas12 |
.1786558
.1668587
1.07
0.284
-.1483813
.5056929
_cons |
8.573535
.2106483
40.70
0.000
8.160672
8.986398
-----------------------------------------------------------------------------Instrumented: logprice
Instruments:
ice seas1 seas2 seas3 seas4 seas5 seas6 seas7 seas8 seas9
seas10 seas11 seas12 cartel
7
(a) The estimated elasticity is -0.639 with a standard error of 0.073.
(b) A positive demand “error” will shift the demand curve to the right. This will
increase the equilibrium quantity and price in the market. Thus ln(Price) is
positively correlated with the regression error in the demand model. This means
that the OLS coefficient will be positively biased.
(c) Cartel shifts the supply curve. As the cartel strengthens, the supply curve shifts
in, reducing supply and increasing price and profits for the cartels members. Thus,
Cartel is relevant. For Cartel to be a valid instrument it must be exogenous, that
is, it must be unrelated to the factors affecting demand that are omitted from the
demand specification (i.e., those factors that make up the error in the demand
model.) This seems plausible.
(d) The first stage F-statistic is 183.0. Cartel is not a weak instrument. (e) See the
table. The estimated elasticity is -0.867 with a standard error of 0.134. Notice
that the estimate is more negative than the OLS estimate, which is consistent
with the OLS estimator having a positive bias.
(e) In the standard model of monopoly, a monopolist should increase price if the
demand elasticity is less than 1. (The increase in price will reduce quantity but
increase revenue and profits.) Here, the elasticity is less than 1.
E12.2
. reg weeksm1 morekids , robust
Linear regression
Number of obs
F( 1,254652)
Prob > F
R-squared
Root MSE
= 254654
= 3820.91
= 0.0000
= 0.0143
=
21.71
-----------------------------------------------------------------------------|
Robust
weeksm1 |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------morekids | -5.386996
.0871491
-61.81
0.000
-5.557806
-5.216186
_cons |
21.06843
.0560681
375.76
0.000
20.95854
21.17832
------------------------------------------------------------------------------
8
. ivregress 2sls weeksm1 (morekids = samesex ), vce(robust)
Instrumental variables (2SLS) regression
Number of obs
Wald chi2(1)
Prob > chi2
R-squared
Root MSE
=
=
=
=
=
254654
24.53
0.0000
0.0139
21.715
-----------------------------------------------------------------------------|
Robust
weeksm1 |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------morekids | -6.313685
1.274681
-4.95
0.000
-8.812013
-3.815357
_cons |
21.42109
.4872487
43.96
0.000
20.4661
22.37608
-----------------------------------------------------------------------------Instrumented: morekids
Instruments:
samesex
. ivregress 2sls weeksm1 agem1 black hispan othrace (morekids = samesex ), vce(robust)
Instrumental variables (2SLS) regression
Number of obs
Wald chi2(5)
Prob > chi2
R-squared
Root MSE
= 254654
= 6954.98
= 0.0000
= 0.0437
= 21.384
-----------------------------------------------------------------------------|
Robust
weeksm1 |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------morekids | -5.821051
1.246386
-4.67
0.000
-8.263923
-3.378179
agem1 |
.8315975
.0226406
36.73
0.000
.7872228
.8759722
black |
11.62327
.2317953
50.14
0.000
11.16896
12.07758
hispan |
.4041802
.2607962
1.55
0.121
-.106971
.9153314
othrace |
2.130962
.2109857
10.10
0.000
1.717438
2.544486
_cons | -4.791894
.3897868
-12.29
0.000
-5.555862
-4.027925
-----------------------------------------------------------------------------Instrumented: morekids
9
Instruments:
agem1 black hispan othrace samesex
(a) The coefficient is -5.387, which indicates that women with more than 2 children
work 5.387 fewer weeks per year than women with 2 or fewer children.
(b) Both fertility and weeks worked are choice variables. A woman with a positive
labor supply regression error (a woman who works more than average) may also
be a woman who is less likely to have an additional child. This would imply
that Morekids is positively correlated with the regression error, so that the OLS
estimator of βM orekids is positively biased.
(c) The linear regression of morekids on samesex (a linear probability model) yields
morekids = 0.346(SE : 0.001) + 0.066(SE : 0.002)samesex
so that couples with samesex = 1 are 6.6% more likely to have an additional child
that couples with samesex = 0. The effect is highly significant (t-statistic = 35.2)
(d) Samesex is random and is unrelated to any of the other variables in the model
including the error term in the labor supply equation. Thus, the instrument
is exogenous. From (c), the first stage F-statistic is large (F = 1238) so the
instrument is relevant. Together, these imply that samesex is a valid instrument.
(e) No, see the answer to (d).
(f) See first IV regeression. The estimated value of βM orekids is -6.313.
(g) See second IV regeression. The results do not change in an important way. The
reason is that samesex is unrelated to agem1, black, hispan, othrace, so that there
is no omitted variable bias in the previous IV regression.
• comment on the differences and the analogies in the results between exercise “1 Exercise with Stata” above and exercise E12.2: are we measuring the same causal
effects, can we assume that there are the same causal effects
In Exercise 12.2 we check if there is a causal effect from family (more or less children)
to work (unemployment, working time and wage) – but we need an instrument to
control for the inverse causal effect; in exercise “1 - Exercise with Stata” we look a
the causal effect from education to family, that can happen also through work (but
not only: another channel could be cultural values) – and we need an instrument to
control for the inverse causal effect (and in this case it is mostly due to work).
10
Download