102B - Introduction to Econometrics – Winter Term 2012/13 Paolo Pin ppin@stanford.edu Stanford, February 21st 2013 Problem Set 5 This problem set is based on lectures 12 and 13 (February 19th and 21st). It must be turned in to the Economics Academic Office by 4 P.M. on Tuesday March 5th. Late homework will be assigned a grade of 0 and the lowest grade will be dropped in computing grades. It is entirely your responsibility to ensure that you complete the assignments and remember to turn them in on time at the designated location. There will be no extensions for the problem sets. The only exception to this rule is for death of a family member or illness requiring immediate attention of a physician. There will be no exception for job interviews or other non-Stanford activities or for completed work that students forget to turn in. Athletes on the road must still turn in the problem sets by the stated deadlines, although may do so by fax. See the course management policies (http://economics.stanford.edu/undergraduate/ economics-common-syllabus) for more details on these issues. 1 - Exercise with Stata In the coursework you find the dataset ‘fertil’. includes, for women in Botswana during 1988, information on number of children, years of education, age, and religious and economic status variables (this dataset is taken from J. M. Wooldridge (2012) “Introductory Econometrics”). The variables that we are interested for in this exercise are: children: number of living children educ: years of education age: age in years mnthborn: month woman born frsthalf: =1 if mnthborn ≤ 6 1 electric: =1 if has electricity tv: =1 if has tv bicycle: =1 if has bicycle (a) Estimate this model by OLS children = β0 + β1 educ + β2 age + β3 age2 + u and interpret the estimates. In particular, holding age fixed, what is the estimated effect of another year of education on fertility? If 100 women receive another year of education, how many fewer children are they expected to have? . gen age2=age*age . reg children educ age age2, robust Linear regression Number of obs F( 3, 4357) Prob > F R-squared Root MSE = 4361 = 1922.00 = 0.0000 = 0.5687 = 1.4597 -----------------------------------------------------------------------------| Robust children | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------educ | -.0905755 .0060483 -14.98 0.000 -.1024332 -.0787178 age | .3324486 .0192071 17.31 0.000 .2947929 .3701043 age2 | -.0026308 .000352 -7.47 0.000 -.0033209 -.0019408 _cons | -4.138307 .2436211 -16.99 0.000 -4.615928 -3.660685 ------------------------------------------------------------------------------ One year more of education reduces expected children by .09. 100 women would have 9 less children if they all had one year more of education. (b) F rsthalf is a dummy variable equal to one if the woman was born during the first six months of the year. Assuming that f rsthalf is uncorrelated with the error term from part (i), show that f rsthalf is a reasonable IV candidate for education 2 . reg educ frsthalf age age2, robust Linear regression Number of obs F( 3, 4357) Prob > F R-squared Root MSE = = = = = 4361 201.72 0.0000 0.1077 3.711 -----------------------------------------------------------------------------| Robust educ | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------frsthalf | -.8522854 .1132665 -7.52 0.000 -1.074345 -.6302254 age | -.1079504 .0402228 -2.68 0.007 -.1868076 -.0290932 age2 | -.0005056 .0006802 -0.74 0.457 -.0018392 .000828 _cons | 9.692864 .5414317 17.90 0.000 8.631383 10.75435 ------------------------------------------------------------------------------ f rsthalf is a big determinant of educ (even controlling for age), so it has a strong correlation with it: it is relevant. The reason is probably that all women start school in the same month of the year, after having reached a certain age (say the September after they become 6), and most drop at a specific birthday (say the year in which they become 10). It is also safe to argue that it not correlated with the part of children which is not explained by education. (c) Estimate the model from part (i) by using f rsthalf as an IV for educ. Compare the estimated effect of education with the OLS estimate from part (i). . ivreg children age age2 (educ=frsthalf ) , robust Instrumental variables (2SLS) regression Number of obs F( 3, 4357) Prob > F R-squared Root MSE = 4361 = 1838.43 = 0.0000 = 0.5502 = 1.4907 -----------------------------------------------------------------------------| Robust children | Coef. Std. Err. t P>|t| [95% Conf. Interval] 3 -------------+---------------------------------------------------------------educ | -.1714989 .0523859 -3.27 0.001 -.2742019 -.068796 age | .3236052 .0202371 15.99 0.000 .2839302 .3632802 age2 | -.0026723 .0003524 -7.58 0.000 -.0033631 -.0019815 _cons | -3.387805 .5451939 -6.21 0.000 -4.456663 -2.318948 -----------------------------------------------------------------------------Instrumented: educ Instruments: age age2 frsthalf ------------------------------------------------------------------------------ Now, with the help of the instrument, it comes out that the effect of educ is almost double as big. (d) Add the binary variables electric, tv, and bicycle to the model and assume these are exogenous. Estimate the equation by OLS and 2SLS and compare the estimated coefficients on educ. . ivreg children age age2 electric tv bicycle Instrumental variables (2SLS) regression (educ=frsthalf ) , robust Number of obs F( 6, 4349) Prob > F R-squared Root MSE = = = = = 4356 939.15 0.0000 0.5577 1.4789 -----------------------------------------------------------------------------| Robust children | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------educ | -.1639814 .0643804 -2.55 0.011 -.2901999 -.037763 age | .3281451 .0213264 15.39 0.000 .2863345 .3699556 age2 | -.0027222 .0003503 -7.77 0.000 -.0034089 -.0020354 electric | -.1065314 .1583542 -0.67 0.501 -.4169864 .2039236 tv | -.002555 .204425 -0.01 0.990 -.4033322 .3982222 bicycle | .3320724 .0506832 6.55 0.000 .2327074 .4314374 _cons | -3.591332 .639396 -5.62 0.000 -4.844874 -2.33779 -----------------------------------------------------------------------------Instrumented: educ Instruments: age age2 electric tv bicycle frsthalf ------------------------------------------------------------------------------ 4 electric and tv have significative and huge effect: they have probably the causal interpretation that they reduce the time married couples spend in intimacy. bycicle is a control variable for wealth and for the amount of time not spent at home. 2 - One theoretical exercise Consider the simple regression model y = β0 + β1 x + u and let z be a binary instrumental variable for x. Show that the IV estimator β̂1 can be written as ȳ1 − ȳ0 β̂1 = , x̄1 − x̄0 where ȳ0 and ȳ0 are the sample averages of yi and xi over the part of the sample with zi = 0, and where ȳ1 and ȳ1 are the sample averages of yi and xi over the part of the sample with zi = 1. This estimator, known as a grouping estimator, was first suggested by Wald (1940). We know that β̂1T SLS Pn (zi − z̄)(yi − ȳ) sZY . = = Pni=1 sZX i=1 (zi − z̄)(xi − x̄) But then we can write n X " # X (zi − z̄)(yi − ȳ) = i=1 " # X (1 − z̄)(yi − ȳ) + i:zi =1 (−z̄)(yi − ȳ) i:zi =0 " # X = ȳ1 − z̄ ȳ + " (−z̄)(yi − ȳ) + i:zi =1 " # X = # X (−z̄)(yi − ȳ) i:zi =0 " (zi − z̄)yi + i:zi =1 # X (zi − z̄)yi − ȳ n X (zi − z̄) i:zi =0 i=1 {z | " = # X " # X zi yi − i:zi =1 z̄yi + i:zi =1 = ȳ1 1 − z̄ ȳ1 i:zi =1 X i:zi =1 5 # X zi yi − 1 {z =0 " X z̄yi i:zi =0 i:zi =0 | ! ! X " } # =0 } ! − z̄ ȳ0 n − X i:zi =1 1 . Now consider that z̄ = 1 /n, and so i:zi =1 P ! X 1 ! − z̄ i:zi =1 So β̂1T SLS X 1 i:zi =1 ! = z̄ n− X 1 . i:zi =1 P (ȳ1 − ȳ0 )z̄ n − i:zi =1 1 sZY ȳ − ȳ0 = 1 P = = . sZX x̄1 − x̄0 (x̄1 − x̄0 )z̄ n − i:zi =1 1 3 - Exercises from the book Do the following exercises from: Introduction to Econometrics by James H. Stock and Mark W. Watson (Addison-Wesley, 3rd Edition): • empirical exercises, requiring Stata: exercises E12.1 and E12.2; E12.1 . gen logprice=log(price) . gen logquantity=log(quantity ) . reg logquantity logprice ice seas1 seas2 seas3 seas4 seas5 seas6 seas7 seas8 seas9 seas10 seas11 Linear regression Number of obs F( 14, 313) Prob > F R-squared Root MSE = = = = = 328 11.77 0.0000 0.3126 .39727 -----------------------------------------------------------------------------| Robust logquantity | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------logprice | -.6388847 .0732804 -8.72 0.000 -.7830692 -.4947003 ice | .4477537 .1349288 3.32 0.001 .1822717 .7132358 seas1 | -.1328219 .0957944 -1.39 0.167 -.3213042 .0556604 seas2 | .0668882 .0907065 0.74 0.461 -.1115834 .2453599 seas3 | .1114365 .0970148 1.15 0.252 -.0794472 .3023201 seas4 | .1554219 .1324978 1.17 0.242 -.1052771 .416121 seas5 | .1096585 .1276572 0.86 0.391 -.1415162 .3608333 6 seas6 | .0468325 .1766425 0.27 0.791 -.3007243 .3943894 seas7 | .1225526 .1998661 0.61 0.540 -.2706984 .5158036 seas8 | -.2350078 .1749897 -1.34 0.180 -.5793126 .109297 seas9 | .0035607 .1723754 0.02 0.984 -.3356003 .3427217 seas10 | .1692469 .1729309 0.98 0.328 -.1710071 .5095009 seas11 | .2151845 .1728162 1.25 0.214 -.1248439 .5552128 seas12 | .2196331 .1700043 1.29 0.197 -.1148625 .5541287 _cons | 8.861233 .177072 50.04 0.000 8.512831 9.209635 -----------------------------------------------------------------------------. ivregress 2sls > ce(robust) logquantity ice seas1 seas2 seas3 seas4 seas5 seas6 seas7 seas8 seas9 seas10 sea Instrumental variables (2SLS) regression Number of obs Wald chi2(14) Prob > chi2 R-squared Root MSE = = = = = 328 165.29 0.0000 0.2959 .39279 -----------------------------------------------------------------------------| Robust logquantity | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------logprice | -.8665865 .1307362 -6.63 0.000 -1.122825 -.6103483 ice | .422934 .1315104 3.22 0.001 .1651784 .6806896 seas1 | -.1309732 .1005382 -1.30 0.193 -.3280245 .066078 seas2 | .0909521 .0927421 0.98 0.327 -.090819 .2727233 seas3 | .135872 .0980894 1.39 0.166 -.0563797 .3281237 seas4 | .1525109 .1313612 1.16 0.246 -.1049523 .4099741 seas5 | .0735618 .1271374 0.58 0.563 -.175623 .3227465 seas6 | -.0060642 .1721703 -0.04 0.972 -.3435118 .3313834 seas7 | .0602324 .1964209 0.31 0.759 -.3247454 .4452102 seas8 | -.2935991 .1707606 -1.72 0.086 -.6282837 .0410855 seas9 | -.0583723 .1714096 -0.34 0.733 -.3943289 .2775844 seas10 | .0858109 .1738156 0.49 0.622 -.2548614 .4264832 seas11 | .1517912 .1716185 0.88 0.376 -.184575 .4881573 seas12 | .1786558 .1668587 1.07 0.284 -.1483813 .5056929 _cons | 8.573535 .2106483 40.70 0.000 8.160672 8.986398 -----------------------------------------------------------------------------Instrumented: logprice Instruments: ice seas1 seas2 seas3 seas4 seas5 seas6 seas7 seas8 seas9 seas10 seas11 seas12 cartel 7 (a) The estimated elasticity is -0.639 with a standard error of 0.073. (b) A positive demand “error” will shift the demand curve to the right. This will increase the equilibrium quantity and price in the market. Thus ln(Price) is positively correlated with the regression error in the demand model. This means that the OLS coefficient will be positively biased. (c) Cartel shifts the supply curve. As the cartel strengthens, the supply curve shifts in, reducing supply and increasing price and profits for the cartels members. Thus, Cartel is relevant. For Cartel to be a valid instrument it must be exogenous, that is, it must be unrelated to the factors affecting demand that are omitted from the demand specification (i.e., those factors that make up the error in the demand model.) This seems plausible. (d) The first stage F-statistic is 183.0. Cartel is not a weak instrument. (e) See the table. The estimated elasticity is -0.867 with a standard error of 0.134. Notice that the estimate is more negative than the OLS estimate, which is consistent with the OLS estimator having a positive bias. (e) In the standard model of monopoly, a monopolist should increase price if the demand elasticity is less than 1. (The increase in price will reduce quantity but increase revenue and profits.) Here, the elasticity is less than 1. E12.2 . reg weeksm1 morekids , robust Linear regression Number of obs F( 1,254652) Prob > F R-squared Root MSE = 254654 = 3820.91 = 0.0000 = 0.0143 = 21.71 -----------------------------------------------------------------------------| Robust weeksm1 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------morekids | -5.386996 .0871491 -61.81 0.000 -5.557806 -5.216186 _cons | 21.06843 .0560681 375.76 0.000 20.95854 21.17832 ------------------------------------------------------------------------------ 8 . ivregress 2sls weeksm1 (morekids = samesex ), vce(robust) Instrumental variables (2SLS) regression Number of obs Wald chi2(1) Prob > chi2 R-squared Root MSE = = = = = 254654 24.53 0.0000 0.0139 21.715 -----------------------------------------------------------------------------| Robust weeksm1 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------morekids | -6.313685 1.274681 -4.95 0.000 -8.812013 -3.815357 _cons | 21.42109 .4872487 43.96 0.000 20.4661 22.37608 -----------------------------------------------------------------------------Instrumented: morekids Instruments: samesex . ivregress 2sls weeksm1 agem1 black hispan othrace (morekids = samesex ), vce(robust) Instrumental variables (2SLS) regression Number of obs Wald chi2(5) Prob > chi2 R-squared Root MSE = 254654 = 6954.98 = 0.0000 = 0.0437 = 21.384 -----------------------------------------------------------------------------| Robust weeksm1 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------morekids | -5.821051 1.246386 -4.67 0.000 -8.263923 -3.378179 agem1 | .8315975 .0226406 36.73 0.000 .7872228 .8759722 black | 11.62327 .2317953 50.14 0.000 11.16896 12.07758 hispan | .4041802 .2607962 1.55 0.121 -.106971 .9153314 othrace | 2.130962 .2109857 10.10 0.000 1.717438 2.544486 _cons | -4.791894 .3897868 -12.29 0.000 -5.555862 -4.027925 -----------------------------------------------------------------------------Instrumented: morekids 9 Instruments: agem1 black hispan othrace samesex (a) The coefficient is -5.387, which indicates that women with more than 2 children work 5.387 fewer weeks per year than women with 2 or fewer children. (b) Both fertility and weeks worked are choice variables. A woman with a positive labor supply regression error (a woman who works more than average) may also be a woman who is less likely to have an additional child. This would imply that Morekids is positively correlated with the regression error, so that the OLS estimator of βM orekids is positively biased. (c) The linear regression of morekids on samesex (a linear probability model) yields morekids = 0.346(SE : 0.001) + 0.066(SE : 0.002)samesex so that couples with samesex = 1 are 6.6% more likely to have an additional child that couples with samesex = 0. The effect is highly significant (t-statistic = 35.2) (d) Samesex is random and is unrelated to any of the other variables in the model including the error term in the labor supply equation. Thus, the instrument is exogenous. From (c), the first stage F-statistic is large (F = 1238) so the instrument is relevant. Together, these imply that samesex is a valid instrument. (e) No, see the answer to (d). (f) See first IV regeression. The estimated value of βM orekids is -6.313. (g) See second IV regeression. The results do not change in an important way. The reason is that samesex is unrelated to agem1, black, hispan, othrace, so that there is no omitted variable bias in the previous IV regression. • comment on the differences and the analogies in the results between exercise “1 Exercise with Stata” above and exercise E12.2: are we measuring the same causal effects, can we assume that there are the same causal effects In Exercise 12.2 we check if there is a causal effect from family (more or less children) to work (unemployment, working time and wage) – but we need an instrument to control for the inverse causal effect; in exercise “1 - Exercise with Stata” we look a the causal effect from education to family, that can happen also through work (but not only: another channel could be cultural values) – and we need an instrument to control for the inverse causal effect (and in this case it is mostly due to work). 10