Econometrics Stata Quiz (2/28/07) Name: You have a dataset that covers 300 students in three different high schools. The variables in your dataset are as follows: Yi = test score for student i X1i = average class size for student i X2i = parental income (in thousands of dollars) for student i HSi = high school attended by student i Your job is to figure out the true effect of class size on test scores in each high school using this data. The relationship may be the same for different high schools or it may be the same. For the purposes of this quiz, suppose that class size and parental income are the only variables that can affect a student’s test score. 1) Suppose you run the following regression: . reg y x1 Is the estimated coefficient for x1 an unbiased estimate for the true effect of class size on test scores? If so, why? If not, why not? Write down the model that this regression is assuming describes the relationship between class size and the other variables. Answer: This regression assumes that the true model is: Yi 0 1 X1i ui The estimated coefficient would be unbiased if class size was the only variable that affected test scores, or if class size and parental income were uncorrelated with each other. Unfortunately, the true model needs to include parental income and parental income is correlated with class size, so this regression gives you a biased estimate of the effect of class size on test scores. 2) Now suppose you run the following regression: . reg y x1 x2 Is the estimated coefficient for x1 an unbiased estimate for the true effect of class size on test scores? If so, why? If not, why not? Write down the model that this regression is assuming describes the relationship between class size and the other variables. Answer: This regression assumes that the true model is: Yi 0 1 X1i 2 X 2i ui All the variables that could affect test scores are included in the model and so we get an unbiased estimate of the effect of class size on test scores. The true effect of class size on test scores is -2 (and not the -5.4 you found in the previous problem). The regression is telling you that when average class size goes up by 1, test scores go down by 2 points. This effect is much smaller than the incorrect estimate you found in problem 1. In problem 1, we assigned some of the effect of higher parental income to class size, leading us to overestimate the importance of lower class size in causing better test scores. 3) Explain how you could figure out the relationship between class size and parental income. What is the effect of parental income on a student’s average class size? Answer: You could run a regression of class size on parental income. That regression would give you an estimate of this effect. That regression tells you that when parental goes up by 1 (which represents $1000), average class size goes down by 0.8. 4) You can use an if statement to look at the variables for any given high school. If you wanted to graph test scores against class size for just high school 1, you could type . graph y x1 if hs==1 You can also run a regression using just the data for any given high school. Your job: Run regressions separately for the different high schools. Is the relationship between test scores and the other variables the same for different high schools? Is it different? Explain. Answer: By running the following three regressions, you will notice that the relationship between test scores and the other variables is the same for all three high schools. . reg y x1 x2 if hs==1 . reg y x1 x2 if hs==2 . reg y x1 x2 if hs==3 You should notice that, for each regression, ˆ0 14, ˆ1 2, ˆ2 3.