Assignment
The assignment was to do the following book problems:
• 17.2, 17.6, 17.7
• C17.1, C17.2, C17.3
17.2
Let grad be a dummy variable for whether a student-athlete at a large university graduates in five years. Let hsGP A and SAT be high school grade point average and SAT score, respectively. Let study be the number of hours spetn per week in an organized study hall. Suppose that, using data on 420 students-athletes, the following logit model is obtained:
ˆ
( grad = 1 | hsGP A, SAT, study ) = Λ( − 1 .
17 + 0 .
24 hsGP A + 0 .
00058 SAT + 0 .
073 study ) where Λ( z ) = exp ( z ) / [1 + exp ( z )] is the logit function. Holding hsGP A fixed at 3.0 and
SAT fixed at 1,200, compute the estimated difference in the graduation probability for someone who spent 10 hours per week in study hall and someone who spent 5 hours per week.
This one is easy if you can translate the lingo properly. The question is asking us to use the estimated coefficients of a model (-1.17, 0.24, 0.00058, and 0.073) to compute two predictions (“p-hats” or “y-hats” or whatever you want to call them). The predicted probability of the student-athlete graduating ( ˆ ( grad = 1)) given a high school GPA of
3.0, an SAT of 1200, and 10 hours of study per week is given by
Λ( − 1 .
17 + 0 .
24 × 3 .
0 + 0 .
00058 × 1200 + 0 .
073 × 10) = Λ(0 .
976) .
To translate this function into a single number (a numerical prediction), we simply use the formula given to us by Wooldridge: exp (0 .
976)
Λ(0 .
976) =
1 + exp (0 .
976)
= 0 .
7263 .
1
So our prediction for the probability of a student-athlete who studies 10 hours per week graduating is 0.7263. If this same student athlete (same GPA and SAT scores) instead studied only 5 hours per week, we would instead predict their probability of graduating to be:
Λ( − 1 .
17 + 0 .
24 × 3 .
0 + 0 .
00058 × 1200 + 0 .
073 × 5) = 0 .
6482 .
Thus the difference (the gain in probability of graduation from an extra 5 hours of study per week) is 0 .
7263 − 0 .
6482 = 0 .
0781. That’s a pretty healthy increase!
17.6
Consider a family saving function for the population of all families in the United States sav = β
0
+ β
1 inc + β
2 hhsize + β
3 educ + β
4 age + u, where hhsize is household size, educ is years of education of the household head, and age is the age of the household head. Assume that E ( u | inc, hhsize, educ, age ) = 0.
(i) Suppose that the sample includes only families whose head is over 25 years old. If we use OLS on such a sample, do we get unbiased estimators of the β j
? Explain.
Sure! We would be concerned about bias if the selection of our sample was based on an unobservable (a variable that was omitted from our savings function). But since the selection is based on what we are assuming is an exogenous variable (uncorrelated with u ), then the selection of the sample causes no problem.
(ii) Now, suppose our sample includes only married couples without children. Can we estimate all of the parameters in the saving equation? Which ones can we estimate?
If we only include married couples without children, we are excluding families with single parents and families with children. This is a pretty serious restriction — we are restricting the sample to households made up of two people. If we do this, we have no variation whatsoever in hhsize . When we have no variation in a variable, we cannot estimate the coefficient associated with the variable.
(iii) Suppose we exclude from our sample families that save more than $25,000 per year.
Does OLS produce consistent estimators of the β j
?
Yikes. This is a whole different ball game. Now, instead of selecting our sample based on the exogenous variables (the x-variables), we are selecting our sample based on outcomes.
This is a bad idea. How much can we learn about a problem if we only look at certain outcomes? Your intuition should be “not much.” OLS would not produce consistent estimators, since we are biasing the coefficients by selecting on y ( sav ). When we select on y , we are causing truncation in the dataset. When we have a truncated dependent variable, we don’t typically use OLS. See notes on truncation to remember why.
17.7
Suppose you are hired by a university to study the factors that determine whether students admitted to the university actually come to the university.
You are given
2
a large random sample of students who were admitted the previous year. You have information on whether each student chose to attend, high school performance, family income, financial aid offered, race, and geographic variables. Someone says to you, “Any analysis of that data will lead to biased results because it is not a random sample of all college applicants, but only those who apply to this university.” What do you think of this criticism?
I think that the person who offered this criticism should retake econometrics. The task we have been given is to study the factors that determine whether students admitted to the university come to the university. We were not asked to study the factors that determine whether a random sample of college applicants actually come to the university.
If we had (and that would be a silly question) then we would behave differently. This is an example of a selection issue that does not cause any problems. Your immediate reaction might be to suspect that all cases of selection cause bias, but as this example shows, sometimes selection — some restriction of the sample — is appropriate.
C17.1
Use the data in PNTSPRD for this exercise.
(i) The variable f avwin is a binary variable if the team favored by the Las Vegas point spread wins. A linear probability model to estimate the probability that the favored team wins is
P ( f avwin = 1 | spread ) = β
0
+ β
1 spread.
Explain why, if the spread incorporates all relevant information, we expect β
0
= 5.
In every game, there is a favorite and an underdog. The spread is an expression of the gap between the favorite and the underdog. If the favorite is judged to be 5 points better than the underdog, then the spread is 5. The model is attempting to predict the probability that the favored team wins a game. If the spread incorporates all information relevant to whether or not a team wins the game, then the probability of the favorite winning by more or less than the spread should be 50/50. That is, ˆ
0 should be equal to
0.5. Said another way, if the spread was 0 (if there was no favorite), then the probability of either team winning is 0.5.
(ii) Estimate the model from part (i) by OLS. Test H
0
: β
0
= 0 .
5 against a two-sided alternative. Use both the usual and heteroskedasticity-robust standard errors.
I run this model by using the command reg favwin spread in Stata. If I consult the t-statistic printed out by Stata, I can directly evaluate the hypothesis H
0
: β
0
= 0. But
I want to test a different hypothesis. I have to make my own t-stat for that hypothesis.
Luckily, all the necessary information is printed out for me. The relevant t-stat is given by (0.5769 - 0.5)/0.0282 = 2.73. Based on this t-statistic, I would reject the null at the
α = 0 .
05 level. Performing the exact same procedure with the heteroskedasticity-robust standard errors does not change things much (you could use the command reg favwin spread, robust to do this in Stata).
(iii) Is spread statistically significant? What is the estimated probability that the favored team wins when spread = 10?
3
spread is statistically significant (t-stat = 8.28 or 10.08 depending on whether robust standard errors are used or not). The estimated probability that the favored team wins when spread = 10 is obtained by plugging the estimated coefficients back into the model and assuming a spread value of 10. Alternatively, we can have Stata do the work for us.
I issued the following commands to have Stata do the work for me (I did this with the regular, i.e. non-robust, standard errors): reg favwin spread predict yhat sum yhat if spread == 10
I used these commands to determine that the predicted probability of the favorite winning when the spread is 10 points is equal to 0.7706.
(iv) Now, estimate a probit model for P ( f avwin = 1 | spread ). Interpret and test the null hypothesis that the intercept is zero. [Hint: remember that Φ(0) = 0 .
5 .
]
To run a probit model instead of an LPM (as we did in the previous step), I simply modify the Stata code a tiny bit.
probit favwin spread
This gives me a model that I can use to test the null hypothesis that the intercept is zero. I can do this simply by consulting the Stata regression output. The t-stat is -0.10, which leads us to not reject the null hypothesis that the intercept is equal to zero. Note that this is exactly the analog of testing that β
0
= 0 .
5 in the linear model (this is what the hint was about).
(v) Use the probit model to estimate the probability that the favored team wins when spread = 10. Compare this with the LPM estimate from part (iii).
Just as above, I use the following commands to evaluate: probit favwin spread predict phat sum phat if spread == 10 When I issue these commands I can see that the predicted probability of the favorite winning when the spread is equal to 10 is 0.8196, which is a bit higher than the predictions I obtained using the LPM above.
(vi) Add the variables f avhome , f av 25, and und 25 to the probit model and test joint significance of these variables using the likelihood ratio test. (How many df are in the chi-square distribution?) Interpret this result, focusing on the question of whether the spread incorporates all observable information prior to a game.
This is stuff from last semester. I wouldn’t emphasize this stuff as much since we haven’t spent any time on it this semester, but it’s probably good stuff to review anyway. To run the model and test using the LR test, I use the following commands in Stata: probit favwin spread favhome fav25 und25 estimates store Unrestricted probit favwin spread estimates store Restricted lrtest Unrestricted Restricted The result of this test (df = number of restrictions
= 3) suggests that we cannot reject the hypothesis that the coefficients associated with the 3 new variables are jointly 0. This suggests that the spread really does incorporate
4
whatever information might otherwise be contained in these 3 variables.
C17.2
Use the data in LOANAPP for this exercise; see also Computer Exercise C7.8.
(i) Estimate a probit model of approve on white . Find the estimated probability of loan approval for both whites and nonwhites. How doe these compare with the linear probability estimates?
Running the probit model is straightforward: probit approve white predict phat sum phat if white==0 sum phat if white==1 . By doing this, we find that the estimated probability of loan approval of whites is 0.9084, while for nonwhites it is 0.7078. This is a very similar picture to what we see if we use an LPM instead (use regress instead of probit in the previous set of commands)
(ii) Now, add the variables hrat , obrat , loanprc , unem , male , married , dep , sch , cosign , chist , pubrec , mortlat 1, mortlat 2, and vr to the probit model. Is there statistically significant evidence of discrimination against nonwhites?
When I run the suggested model I learn that there is still substantial evidence that being white influences the probability of loan approval. So yes, there is statistically significant evidence of discrimination against nonwhites.
(iii) Estimate the model from part (ii) by logit. Compare the coefficient on white to the probit estimate.
When I do this, I find that the coefficient is different, but the statistical significance is nearly identical.
(iv) Use equation (17.17) to estimate the sizes of the discrimination effects for probit and logit.
There are two ways to proceed here. You could consult equation (17.17) and find that multiplying the logit coefficient by 0.625 makes the two coefficient estimates appear similar. A more informative thing to do is probably to compare the predictions from the two models (for instance, we could look at the average predicted probabilities across the sample using both models). This would be more interesting. When I do this, I find that the average predicted probability of an approved loan is identical using both models (to the first 3 digits after the decimal.
1 C.17.3
Use the data in FRINGE for this exercise.
(i) For what percentage of the workers in the sample is pension equal to zero? What is the range of pension for workers with nonzero pension benefits/ Why is a Tobit model appropriate for modeling pension ?
5
pension is equal to zero for 27.92 percent of the sample. (I determined that by using the command tabulate pension ). For workers with nonzero pension benefits, the minimum pension value is 7.28 and the maximum is 2880.27. A Tobit model is appropriate because the pension variable cannot be less than zero, takes on the value 0 for a significant portion of the population, and takes on a continuum of values greater than 0. The variable is continuous, but appears discontinuous (lumpy) at 0.
(ii) Estimate a Tobit model explaining pension in terms of exper , age , tenure , educ , depends , married , white , and male . Do whites and males have statistically significant higher expected pension benefits?
I use the following command to estimate this model: tobit pension exper age tenure educ depends married white male, ll(0) .
When I estimate the Tobit model I see that whites and males are both predicted to have higher expected pension benefits (positive coefficients) but only being male has an effect that is statistically significant at conventional levels.
(iii) Use the results from part (ii) to estimate the difference in expected pension benefits for a white male and a nonwhite female, both of whom are 35 years old, are single with no dependents, have 16 years of eduction, and have 10 years of experience.
There are several ways to do this. You could do it manually using the estimated coefficients, or you could do it a number of ways using Stata. Here’s how I did it using Stata
(the lazy way): tobit pension exper age tenure educ depends married white male, ul(0) set obs 618 replace white = 1 in 617 replace male = 1 in 617 replace age = 35 in 617 replace married = 0 in 617 replace depends = 0 in 617 replace educ = 16 in 617 replace exper = 10 in 617 replace tenure = 10 in 617 predict yhat sum yhat if white==1 & male==1 & age==35 & married==0 & depends==0///
& educ==16 & exper==10
Doing this lets me see that a white male with the appropriate description is predicted to have a pension of 941.1191. Using an identical procedure for a nonwhite female (placing observations of this individual in observation 618 instead of 617 tells me that a nonwhite female with the same description is predicted to have a pension of 488.8831, which is significantly less. This is enough to understand that there is a substantial difference in the predictions. To obtain the predictions from the Tobit model, we would take these linear predictions and plug them in to equation (17.25; page 590 in the 4th ed.). Knowing the first part is sufficient for our purposes.
(iv) Add union to the Tobit model and comment on its significance.
Union appears quite significnat (t-stat = 7.03).
6
(v) Apply the Tobit model from part (iv) but with peratio , the pension-earnings ratio, as the dependent variable. (Notice that this is a fraction between zero and one, but, though it often takes on the value zero, it never gets close to being unity. Thus, a Tobit model is fine as an approximation.) Does gender or race have an effect on the pension-earnings ratio?
In this case, both gender and race appear not to have an impact. Whites and males do not seem to have higher pensions as a fraction of earnings. This suggests that differences in pension levels discovered in previous regressions were due to differences in earnings.
7