Econ 4211 – Principles of Econometrics Final Exam December 20, 2010 Instructions This exam contains 3 (three) problems: 1. Problem 1 is worth 30 points, 2. Problem 2 is worth 55 points, and 3. Problem 3 is worth 40 points. The total points on this exam thus sum up to 125. However, you can only earn a maximum of 100 points, i.e. anything you earn in excess of 100 points will be capped at 100. You have 2 (two) hours to complete the exam. Calculators are allowed, and, in fact, strongly encouraged. Cell phone calculators are fine, laptops are not. You are not allowed to use any notes, and you are not allowed to consult with other students during the test. Feel free to use the back sides of test pages for notes if you need more writing space. Good luck! 1 Problem 1. Estimating Production Functions (30 points) Consider a standard Cobb-Douglas production function: Y = ALβ1 K β2 , (1) where Y is output, L is labor inputs, K is capital inputs, A is a measure of technology. The primary interest is in estimating β1 and β2 , the factor shares. In particular, we are interested in testing the constant returns to scale hypothesis: β1 + β2 = 1. Do the following: 1. You obtain data on i = 1, . . . , n firms for t = 1, . . . , T years for each firm. Your boss suggests that you estimate the following model: yit = β0 + β1 lit + β2 kit + uit , (2) where the data is pooled together across firms and across time periods. It essentially assumes that firm i at time t1 is different from the same firm i at time t2 . Spell out the precise relationships between lowercase variables in equation (2) and their uppercase counterparts in equation (1). How do you interpret β0 ? (5 points) Answer. Take the natural log of the equation (1) and it will be clear that log[Lit ] = lit , log[Kit ] = kit , log[A] = β0 . 2. Suppose you have the estimation results for equation (2), i.e. you know βˆ0 , βˆ1 , βˆ2 , along with their variance-covariance matrix. Explain how to construct a t-test for H0 : β1 + β2 = 1 against the two sided alternative H1 : β1 + β2 6= 1. (Hint: don’t forget about the covariance term between βˆ1 and βˆ2 .) (5 points) Answer. The formula for the t-statistic is very simple: t= βˆ1 + βˆ2 − 1 , s. e.[βˆ1 + βˆ2 ] and the only problem is to compute the standard error in the denominator. Here is when you should keep the covariance term in mind: q q s. e.[βˆ1 + βˆ2 ] = Var[βˆ1 + βˆ2 ] = s. e.[βˆ1 ]2 + s. e.[βˆ2 ]2 + 2 Cov[βˆ1 , βˆ2 ]. Since the alternative is two-sided, one has to compare the absolute value of the t-statistic with the prespecified critical value to draw the needed inference. 3. Your estimate equation (2) via OLS and obtain the following output: yit = 0.256 + 0.36 lit + 0.68 kit + uit (0.11) (0.09) (0.23) h i Number of obs. = 745, Cov β̂1 , β̂2 = −0.03, where the small numbers in parentheses represent standard errors. Using all this information, construct the test statistic that tests the constant returns to scale hypothesis. (Hint: you need to use t-test here.) Compute the value of this statistic and compare it with the critical value of 1.647. What do you conclude? (5 points) Answer. The t-statistic is: t= p 0.36 + 0.68 − 1 0.092 + 0.232 − 2(0.03) = 1.26, and since |1.26| = 1.26 < 1.647, we fail to reject the constant returns to scale null hypothesis. 2 4. You become suspicious that your boss may not know econometrics well enough. In particular, you are worried that the data may exhibit heteroscedasticity and serial correlation. For each of those potential problems: • • • • State the problem formally. Say what particular feature of the data can make you suspect such a problem. Discuss of the MLR assumptions will be violated under each of those problems. Say what will be the effect on OLS estimates. Be precise yet concise. (10 points) Answer. Heteroscedasticity: (a) When Var[uit | l, k] 6= σ 2 for all i, t. (b) Since we have multiple observations on different firms in the data, variances of their error terms may differ across firms. (c) This is direct violation of MLR5. (d) OLS estimates will not suffer, but their standard erros will be wrong, and our inference procedures won’t be valid. Serial Correlation: (a) When Cov[uit , ujs | l, k] 6= 0 for some i, j, t, s. (b) Since we have multiple observations on different firms in the data, there may be some firm-specific factors in the unobserved error term. This may cause the error terms to be correlated across time for given firms. (c) This is direct violation of MLR2. Our sample is not completely random. You could say this violates TS5’, which is fine. (d) Again, OLS estimates will not suffer, but their standard erros will be wrong, and our inference procedures won’t be valid. 5. You estimate equation (2) yet again, this time computing standard errors that are robust to both heteroscedasticity and serial correlation. Here is the output: yit = 0.256 + 0.36 lit + 0.68 kit + uit (0.18) (0.26) (0.43) h i Redo part 3 of this problem again using the new regression output and the fact that now Cov β̂1 , β̂2 = −0.12624. Does your conclusion change now? (5 points) Answer. The t-statistic is: t= p 0.36 + 0.68 − 1 0.262 + 0.432 − 2(0.12624) = 8.94, and since |8.94| = 8.94 > 1.647, we now reject the constant returns to scale null hypothesis. So our conclusions did change once we accounted for the possible econometric pitfalls. 3 Problem 2. Effect of Financial Aid on College Enrollment (55 points) Suppose you work for a very selective private liberal arts college as an admissions officer, and your goal is to pick the best students from the pool of applicants. Tuition rates at your college are very high, and thus you suspect that awarding financial aid to the best applicants may impact their decisions to accept the offer from your school. Since you cannot award aid to everyone, you need to estimate the effect of financial aid on college acceptance. As an econometrician, you start with a following simple model: yi = β0 + β1 fi + ui , where i = 1, . . . , n are potential students, yi = 1 if student i accepts the offer to join your college (yi = 0 otherwise), and fi = 1 if student i was offered a fellowship (fi = 0 otherwise). As usual, ui contains all other factors that affect students’ acceptance decisions. You have access to a random sample of students who applied to your college last year. Do the following: 1. State and briefly discuss our standard assumptions MLR1-MLR4 with respect to this particular model. (5 points) Answer. Here are the assumptions: • MLR1 states the true model is linear. This is actually true here because the only explanatory variable is an indicator. So unless there are other explanatory variables, the only possible functional dependence of y on f is the one that’s given. This is a fairly subtle consideration though, I don’t think anyone got it right. • MLR2 states that we have a random sample. The problem explicitly states such a sample is available. Some people noted the sample of students that apply to the selective liberal arts college can’t be random. That would be correct if we wanted to see the effect of financial aid on the whole population of college applicants, not just on people who apply to our college. But as the problem goes, the applicants to our college is our whole population of interest. • MLR3 says we cannot have linear dependencies in the explanatory variables. As long as the sample contains at least 1 person who was offered financial aid and at least one person that was not offered aid, it holds. • MLR4 says E[ui | fi ] = 0, i.e. the effect of financial aid on offer acceptance is completely captured by the model. This is highly unlikely, since financial aid can be correlated with other factors that influence a student’s decision to enroll, and these factors are captured by u. 2. Write down the standard OLS minimization problem that yields estimates β̂0 and β̂1 as the solution. (5 points) Answer. Very simple: min β0 ,β1 n X 2 (yi − β0 − β1 fi ) i=1 3. Suppose you inspect your data and find out that n = 1000. Moreover, since both yi and fi are indicator variables, the data can be split into four groups: (a) (b) (c) (d) people with fi people with fi people with fi people with yi = yi = 0 (there are n00 observations like this), = 1 and yi = 0 (there are n10 observations like this), = 0 and yi = 1 (there are n01 observations like this), = fi = 1 (there are n11 observations like this). Use this information to rewrite the OLS problem you’ve set up above as minimization problem with four sums instead one of one. You will be able to substitute both fi and yi into each of those four sums. Simplify your expressions as much as possible (this will help later on). (Hint: once you’ve done all the simplifications, there should be no more sum signs (Σ) left in your objective function.) (10 points) 4 Answer. min β0 ,β1 = = min β0 ,β1 n X (yi − β0 − β1 xi ) 2 i=1 "n 00 X 2 (−β0 ) + i=1 n00 +n01 X 2 (1 − β0 ) + i=n00 +1 n01 +n10 X 2 (−β0 − β1 ) + i=n01 +1 n10 +n11 X # 2 (1 − β0 − β1 ) i=n10 +1 h i 2 2 2 2 min n00 (−β0 ) + n01 (1 − β0 ) + n10 (−β0 − β1 ) + n11 (1 − β0 − β1 ) β0 ,β1 4. You take a closer look at the data and find out that n00 = 400, n01 = 200, n10 = 100, and n11 = 300. Use this information and your simplified objective function from part 3. to obtain exact numbers for β̂0 and β̂1 . (Hint: they should both be between 0 and 1.) (10 points) Answer. 2 2 2 2 min 4 (−β0 ) + 2 (1 − β0 ) + (−β0 − β1 ) + 3 (1 − β0 − β1 ) . β0 ,β1 First order conditions: 4 −β̂0 + 2 1 − β̂0 + −β̂0 − β̂1 + 3 1 − β̂0 − β̂1 = 0 −β̂0 − β̂1 + 3 1 − β̂0 − β̂1 = 0 2 − 6β̂0 = 0 3 − 4β̂0 − 4β̂1 = 0 β̂0 = β̂1 = 1 3 1 3 − β̂0 = 4 12 5. If you did part 4 right, you would find β̂1 > 0. Explain why this may not necessarily be a true measure of the financial aid effect on acceptance. (Hint: you should think about selection bias.) (5 points) Answer. To argue for causal interpretation of βˆ1 we need to be able to claim that fi is randomly assigned to applicants. Without this randomization, one cannot compare people with fi = 0 to people with fi = 1, since there can be some other factors that make these two subsamples fundamentally different. Usually aid is offered to bright and promising students, and other colleges may also be willing to attract such students. Hence the mere fact that student i was awarded financial aid may “proxy” for the student’s “quality” and in fact may turn out to be negatively correlated with his decision to join our college. 6. Suppose now that you have information on (x1i , x2i , . . . , xki ) for every student. Name at least 2 factors that you would like to observe in order to partially alleviate the problem from part 5 of this question. What should you do with these factors? (Hint: don’t think too long.) (5 points) Answer. The best two things to observe are students’ abilities and the number of offers they got from other colleges. Controlling for these seems very important. It is not entirely unlikely to observe these two things – abilities can be measured by GPAs and SAT scores, and it is common for colleges to poll the applicants on what other options they have. Other things like family incomes and distances to college may also be of interest. These factors should be included in the regression equation together with fi , obviously. 7. Now assume that you also have access to zi – information on the SAT score that applicant i had earned. Moreover, you know that your college has a peculiar mechanism of awarding financial aid: anyone with zi > z̄ gets fi = 1, and all with zi ≤ z̄ end up with fi = 0. (Let’s think z̄ = 2100 to be concrete.) Do you think that zi is a good instrument for fi ? Explain what makes an instrument “good” or “bad”. (10 points) 5 Answer. A good instrument has to be relevant and valid. Relevant instrument is correlated with the endogenous variable: Cov[zi , fi ] 6= 0, and this instrument is clearly relevant. Valid instrument is uncorrelated with the error term: Cov[zi , ui ] = 0, and this condition is likely to be violated. More able students are likely to have higher SAT scores, and such students can be expected to receive offers from more than one college. So the instrument is at best questionable, and most likely invalid altogether. 8. Finally, a colleague of yours comes up with an idea: let’s only use data on students who had zi at least 2050 and at most 2150, and not just all n data points. Explain why this may make the instrument from part 7 sound like a better idea than before. (Hint: your answer to part 5 may help.) (5 points) Answer. By limiting the sample to people with SAT scores in the interval [2050, 2150], we are trying to look only at students with fairly similar levels of ability. This partially alleviates the concerns for the invalidity of the instrument. A student who received 2090 on the SAT is unlikely to be considerably less able than the one who had the score of 2110. The latter person, however, would be offered aid. Thus comparing such pairs of students is likely to be more informative. 6 Problem 3. Time Series for U.S. Unemployment Rate (40 points) Suppose that you have data on a single time series yt , t = 1, . . . , T . For concreteness, we will think of yt being the monthly unemployment rate in the U.S. Do the following: 1. Explain what it means for a time series process to be weakly stationary and to be weakly dependent. Provide an intuitive explanation why we need these properties when dealing with time series data. (10 points) Answer. A time series yt is weakly stationary if: (a) E[yt ] = const for all t. (b) Var[yt ] = const for all t. (c) Cov[yt , yt+h ] is a function of h only, for all t. A time series is weakly dependent if Cov[yt , yt+h ] → 0 as h → ∞. We need these two conditions in order to be able to treat historic observations on time series as a quasi-random sample. Otherwise all the series is just a single observation, and no sensible statistical work can be carried out. 2. You spend some time reading The New York Times and become convinced that yt follows a M A (2) process: yt = β0 + ut + β1 ut−1 + β2 ut−2 , 2 where ut ∼ iid 0, σ . Demonstrate that this process is weakly stationary and weakly dependent. (15 points) Answer. Weak stationarity: E [yt ] = E [β0 + ut + β1 ut−1 + β2 ut−2 ] = β0 + E [ut ] + β1 E [ut−1 ] + β2 E [ut−2 ] = β0 . V ar [yt ] = V ar [β0 + ut + β1 ut−1 + β2 ut−2 ] = V ar [ut + β1 ut−1 + β2 ut−2 ] V ar [ut ] + β12 V ar [ut−1 ] + β22 V ar [ut−2 ] = σ 2 1 + β12 + β22 . = Cov [yt , yt+1 ] = Cov [β0 + ut + β1 ut−1 + β2 ut−2 , β0 + ut+1 + β1 ut + β2 ut−1 ] = Cov [ut , β1 ut ] + Cov [β1 ut−1 , β2 ut−1 ] = β1 V ar [ut ] + β1 β2 V ar [ut−1 ] = σ 2 (β1 + β1 β2 ) . Cov [yt , yt+2 ] = Cov [β0 + ut + β1 ut−1 + β2 ut−2 , β0 + ut+2 + β1 ut+1 + β2 ut ] = Cov [ut , β2 ut ] = β2 V ar [ut ] = σ 2 β2 . Cov [yt , yt+h ] = 0 for |h| > 2. Weak dependence follows immediately from the last line. 7 3. Following part 2, you decide to estimate the model using the available data. (You do a little extra reading to find out how to estimate the M A (2) model, since it cannot be done by OLS, but you prevail.) Here is the output from Stata, edited for readability: MA(2) Regression Number of obs = 93 R-squared = 0.3345 F ( 2, 91) = 45.52 Prob > F = 0.0000 -----------------------------------------------------------------------------y | Coef. Std. Err. t P > |t| [95% Conf. Interval] -------------+---------------------------------------------------------------beta0_hat | -.0674973 .1993148 -0.34 0.735 -.4581472 .3231525 beta1_hat | .6215946 .0950987 6.54 0.000 .4352045 .8079847 beta2_hat | .3241549 .099799 3.25 0.001 .1285525 .5197574 -------------+---------------------------------------------------------------sigma_hat | .9508688 .0733175 12.97 0.000 .8071691 1.094569 ------------------------------------------------------------------------------ Discuss if this regression appears “sensible”, i.e. whether β̂1 , β̂2 , or both β̂1 and β̂2 , should be dropped from the model. How well does the model fit the data? (5 points) Answer. Clearly βˆ1 and βˆ2 are individually significant, since the individual p-values are very low (the column P > |t|). The F-test also has a very low p-value, meaning that these two coefficients are jointly significant and should not be dropped. At the same time, βˆ0 appears very insignificant. The R-squared is not terribly high, only 33.5%. Overall, the regression seems to be ok. 4. A friend suggests that reading The New York Times is a no-brainer, and that you should consider The Wall Street Journal instead. You suspect that perhaps AR (1) is a model that fits the unemployment data better than M A (2) does: yt = α0 + α1 yt−1 + ut , where ut is the same as before. You estimate the AR (1) model by OLS and obtain the following regression output: AR(1) regression Number of obs = 93 R-squared = 0.3296 F ( 1, 92) = 40.01 Prob > F = 0.0000 -----------------------------------------------------------------------------y | Coef. Std. Err. t P > |t| [95% Conf. Interval] -------------+---------------------------------------------------------------alpha0_hat | -.0488304 .2437348 -0.20 0.841 -.5265418 .428881 alpha1_hat | .5763899 .0911256 6.33 0.000 .397787 .7549928 -------------+---------------------------------------------------------------sigma_hat | .9635918 .0795041 12.12 0.000 .8077668 1.119417 -----------------------------------------------------------------------------Again, discuss whether this regression “makes sense”, i.e. whether α̂0 , or α̂1 should be dropped from the model. How well does this model fit the same data? (5 points) Answer. Clearly αˆ1 is individually significant, since the individual p-values are very low (the column P > |t|). The F-test also has a very low p-value, meaning that this coefficient should not be dropped. At the same time, αˆ0 appears very insignificant. The R-squared is not terribly high, only 32.9%. Overall, the regression seems to fit the data almost equally well. 8 5. At this stage you are probably confused which regression to pick and hopelessly scramble for an solution to this dilemma. Then you recall that there is a major difference between AR and M A processes in terms of the inter-temporal covariances like Cov [xt , xt+h ]. You decide to compute these for h = 1, . . . , 5 and obtain the following: h 1 2 3 4 5 Cov [xt , xt+h ] 0.5617 0.2133 -0.0370 0.0284 -0.0445 Can you now decide which model fits the data better (and what newspaper hires authors better trained in econometrics)? Justify your answer (at least on the model choice, don’t bother with the newspaper assessments). (5 points) Answer. This pattern suggests that M A(2) model fits the data better. In part 2. of the question we saw that Cov[yt , yt+h ] = 0 for |h| > 2 for this process. The first two covariances appear to be far from zero, and the next three appear to be very close to zero. At the same time, we showed in class that for AR(1) process covariances are all nonzero, but they decrease exponentially. Given the estimate for αˆ1 is positive, the covariances cannot flip signs back and forth like they seem to do. Thus I would pick M A(2) process as an answer. 9