Econ 526/ Manopimoke Fall2014 Econometrics Practice Final Topics to Review for the exam: Chapters: 6, 7, 8, 9, 11, 12 The exam will be a TAKE HOME EXAM, which will be posted on the course website at 5 pm. on Thursday Dec 18. Please type up your answers or scan your written answers and submit it to me at pymm@ku.edu at 10 pm. on Dec 19. Only PDF files will be accepted. NO LATE SUBMISSIONS OR ILLEGIBLE EXAMS WILL BE CONSIDERED. Some practice questions (not comprehensive) are below – more will be covered during the review session on 12/11. Working on the back of the book questions will also be useful (solutions are already on the website). Question 1 Suppose that there are threshold effects when examining the relationship between class size and student test scores. In particular, when the class size is less than 20, students do well and when the class size is > 25 students do poorly. How would you test this hypothesis in a linear regression? Suppose you have data from California that supports your hypothesis. Discuss whether these results would hold for Kansas in terms of external validity. Answer: You would create three dummy variables (Class size small = class size < 20; class size middle = 21-25; class size large = > 25). You would run the regression Test score = constant + class size small + class size large. Then test the null hypotheses that class size small > 0 and class size large < 0. In order for these results from California to be externally valid for Kansas, the make up of students and characteristics of the schools would need to be similar in both states. California is a very urban state, it has more minorities than Kansas, and spends more on education than Kansas. It is unlikely that the results for California will generalize (be externally valid) for Kansas. Question 2 Refer to Table 8.3 on the following page to answer the following questions. a. Use specification (5) to determine the change in test scores from having a student teacher ratio of 20 to 30 when students are in a high English Language Learner district and 50 percent are eligible for free or reduced lunch, and log income is equal to 2.2516. Answer: Since this is a nonlinear model you need to calculate the effects and take the difference. 252 + 64.33(30) – 3.42(30^2) +.059(30^3) -5.47(1) - .42(50) +11.75(2.2516) - 252 + 64.33(20) – 3.42(20^2) +.059(20^3) -5.47(1) - .42(50) +11.75(2.2516) b. Using the output can you reject the null hypothesis that student-teacher ratio has a linear effect on test scores? Answer: This refers to the second F-test. We can reject the null that student teacher ratio has a linear effect on test scores since the polynomial terms are jointly significantly different from zero. c. What is the interpretation of the coefficient on log income in Model 5? Is this a large or small effect? Answer: A 1% change in log income is associated with a change in test scores of .1175. As income increases, test scores barely move. d. What are the threats to internal validity in Model 1? Answer: Omitted variable bias: Some variables that explain tests scores are missing (e.g. income). Functional form: There are nonlinearities in student-teacher ratio that are not included. e. Explain the rationale behind Model 6, why are the interaction terms included in this model. F-test (c) tests the null hypothesis that the coefficients on the interaction terms are all zero. Explain what this means in terms of the model. Answer: Model 6 allows coefficients to differ by whether the school district has a high percentage of English Language learners. It could be that the effect of student teacher ratio is different in schools with a higher percentage of ELL students because the districts have smaller classes to accommodate the ELL students. The null hypothesis is that there is no difference in the effect of student teacher ratio in districts with a high percentage of ELL students. This hypothesis is rejected at the 5% level but not at the 1% level. Question 3 A study investigated the impact of house price appreciation on household mobility. The underlying idea was that if a house were viewed as one part of the household's portfolio, then changes in the value of the house, relative to other portfolio items, should result in investment decisions altering the current portfolio. Using 5,162 observations, the logit equation was estimated as shown in the table, where the dependent variable is one if the household moved in 1978 and is zero if the household did not move: Regression model constant Male Black Married78 marriage change A7983 PURN Pseudo-R2 Logit -3.323 (0.180) -0.567 (0.421) -0.954 (0.515) 0.054 (0.412) 0.764 (0.416) -0257 (0.921) -4.545 (3.354) 0.016 where male, black, married78, and marriage change are binary variables. They indicate, respectively, if the entity was a male-headed household, a black household, was married, and whether a change in marital status occurred between 1977 and 1978. A7983 is the appreciation rate for each house from 1979 to 1983 minus the SMSA-wide rate of appreciation for the same time period, and PNRN is a predicted appreciation rate for the unit minus the national average rate. (a) Interpret the results. Comment on the statistical significance of the coefficients. Do the slope coefficients lend themselves to easy interpretation? Answer: Since the logit model is nonlinear, the slope coefficients cannot be easily interpreted. However, the signs of the coefficients indicate the direction of the relationship between the regressors and the binary dependent variable. Accordingly, being married or having experienced a marriage change increases the probability of moving. A male-headed household or a black household is less likely to move. If the predicted appreciation rate relative to the national average increased, then the household is less likely to move. The same holds for the actual appreciation rate from 1979 to 1983. None of the slope coefficients are statistically significant with the exception of the black household and marriage change coefficients. The two t-statistics are –1.85 and 1.84 respectively. These would be statistically significant at the 5% level of a one-sided hypothesis test. (b) The mean values for the regressors are as shown in the accompanying table. Variable male black married78 marriage change A7983 PNRN Mean 0.82 0.09 0.78 0.03 0.003 0.007 Taking the coefficients at face value and using the sample means, calculate the probability of a household moving. Answer; The probability is 0.021. (c) Given this probability, what would be the effect of a decrease in the predicted appreciation rate of 20 percent, that is A7983 = –0.20? Answer: The resulting probability would be 0.051, i.e., more than twice the value in the previous result. Question 4 Consider the following model of demand and supply of coffee: Demand: Supply: = β1 = β3 + β2 + β4 + ui + β5Weather + vi (variables are measure in deviations from means, so that the constant is omitted). (a) Suppose you want to estimate the price elasticity of demand by running an OLS regression on the demand equation. Will your estimate of β1 be unbiased? It not, explain why. Answer: β1 will be biased due to simultaneous causality. The regressor Pcoffee is correlated with the error term (b) Suppose you have an exogenous variable Weather. Why can this variable be useful in estimating β1? Outline the steps that you would follow to estimate β1. Will this give you a consistent estimate of β1? Explain. Answer: Changes in Weather is exogenous and therefore it will shift only the supply equation and thereby trace out the demand equation. Weather is a valid instrumental variable as it is correlated with the endogenous regressor first, regress stage, regress but not the error term ui. To estimate β1 using the variable Weather, on Weather and calculate the predicted value of . Then, in the second on the predicted value obtained from the first stage. This will yield an unbiased estimate of β1. The model is exactly identified as you have the same number of endogenous regressors as instrumental variables. Question 5 Do we care more about internal or external validity in cross-sectional estimation? What about time2 series estimation? We care about R in time-series estimation. Is this also what we care about most in cross-sectional estimation? Explain. Answer: In cross sectional estimation we care more about internal validity. In time series analysis, we care more about R-squared and external validity since the goal of time series estimation is to forecast 2 future outcomes of the time-series. Thus it’s important to have the model with the highest R in order to get the best possible forecast. However, in cross-sectional estimation, we are interested in 2 estimating causal effects. Thus, R does not tell us whether we have identified a causal effect that is free from threats to internal validity (omitted variable bias, etc). Question 6 Explain why when testing joint hypotheses simultaneously, testing them sequentially ("one at a time" method) using a series of t-statistics gives unreliable results. What approach should you use instead to test joint hypotheses? (Hint: Use an example to help explain your answer. Suppose you are testing the joint hypotheses β1=0 and β1=0. What is the probability of rejecting the joint null hypothesis under using the “on at a time” method? Is it too low or too high?) Answer : Using the usual t-statistics to test the restrictions one at a time will give unreliable results and is not the same as testing the joint hypotheses using the F-statistics. Suppose you are interested in testing the joint hypotheses b0=0 and b1=0. Consider the special case in which the t-statistics are uncorrelated and independent. Because the t-statistics are independent, the null is not rejected only if both |t1|<1.96 and |t2|<1.96 = 0.95*0.95=0.9025, so the one at the time test rejects the null too often. Intuitively, this is because it gives you too many chances: if you fail to reject using the first t-statistic, you can try again using the second. If the regressors are correlated, the situation is even more correlated and the result is still unreliable. Use the F-statistic instead.