Class 28 Assignment Answers These questions refer to EMBS Case Problem 2. “Alumni Giving” which concerns data for 48 US national universities (America’s Best Colleges, Year 2000 Edition). Both the University of Notre Dame and the University of Virginia are included. The following five variables are in the data set. Variable Description School Graduation Rate The name Percentage of the of enrollees University who graduate Mean Median Mode Standard Deviation Skewness Minimum Maximum Count % of Classes Under 20 Percentage of Classes offered with <= 20 students. Student/Faculty Ratio Number of students enrolled divided by total number of faculty Alumni Giving Rate 83.042 83.5 92 8.607 55.729 59.5 65 13.194 11.542 10.5 13 4.851 29.271 29 13 13.441 -0.282 66 97 48 -0.501 29 77 48 0.582 3 23 48 0.370 7 67 48 Percentage of living alumni who gave to the University in 2000 1. Test the hypothesis that graduation rate and alumni giving rate are (linearly) independent. We expect universities with higher graduation rates to have higher mean giving rates. [15 points] A regression of giving rate on graduation rate shows a positive linear relationship with reported p-value of 5.24E-10. For Ha: b>0, the p-value is half that, or 2.62E-10. We reject H0 in favor of Ha. The results are statistically significant. Intercept Graduation Rate Coefficients -68.76 1.18 Standard Error 12.58 0.15 t Stat -5.46 7.83 P-value 1.82E-06 5.24E-10 2. If the graduation rate of school A is 5 percentage points higher than that of school B, how much higher do we expect school A’s giving rate to be? [10 points] Using the above regression (graduation rate is all we know), the expected giving rate will be 1.18*5 = 5.9 percentage points higher for school A. 3. If you learn that A and B above have identical student to faculty ratios, what is your revised answer to question 2? Be certain to explain why it went up (if it went up) or why it went down (if it went down) or why it stayed the same. Direct your response to a university administrator. [15 points] For this question, we know both graduation rate and student/faculty ratio. Since the latter is also predictive of giving rate, we will use a multiple regression to answer this question. Intercept Graduation Rate Student/Faculty Ratio Coefficients -19.10631 0.75574 -1.24595 Standard Error 15.55006 0.16023 0.28430 t Stat -1.22870 4.71669 -4.38250 P-value 0.22557 0.00002 0.00007 (Note the p-value associated with student/faculty ratio is very low. Student/faculty ratio is an important variable which should not be ignored.) The 5 point higher graduation rate leads us to expect 0.756*5 = 3.8 percentage points higher giving rate for A. Our answer went down (5.9 to 3.8) because graduation rates and faculty/student ratios are negatively correlated in the sample. (Schools with higher graduation rates are expected to have lower faculty/student ratios….which in turn also lead to higher giving rates.) The answer to 2 reflected this reality. The higher grad rate for A would also imply a lower student faculty ratio…and the combination would lead to expecting 5.9 more percentage points in giving rate. When we learned that A did NOT have a lower student/faculty ratio than B, our expectations for its giving rate go down and we expect a smaller giving rate gap between the two schools. 4. Provide a point forecast of alumni giving rate for a university with graduation rate of 80, 65 percent of its classes with 20 or fewer students, and a student/faculty ratio of 20. [25 points] (To answer this question, I expect you will build a linear regression model. Do not try anything fancy. Just pick which subset of the three numerically scaled variables you think comprise the best model.) From a modeling stand-point, the question is whether percent under 20 is needed. Does it add predictive poser to the model given we have both grad rate and student/faculty ratio? To see, we try the three-variable model. Intercept Graduation Rate % of Classes Under 20 Student/Faculty Ratio Coefficients -20.7201 0.7482 Standard Error 17.5214 0.1660 t Stat -1.1826 4.5082 P-value 0.2433 0.0000 0.0290 -1.1920 0.1393 0.3867 0.2084 -3.0823 0.8358 0.0035 The p-value associated with %under20 is 0.83---not significant. We do not need and should not use all three variables. The model used to answer Q3 should be used to come up with the point forecast. Using a sumproduct to perform the calculation results in a point forecast of 16.4 for the alumni giving rate of the school in question. See below. Intercept Graduation Rate Student/Faculty Ratio Coefficients -19.10631 0.75574 -1.24595 Intercept Graduation Rate Student/Faculty Ratio POINT FORECAST 1 80 20 16.43 5. Of the 48 universities in the data set, which one has the most surprisingly low alumni giving rate? [10 points] (Hint: The answer is not U. of California-Davis. Its last-place giving rate is explained by its relatively low graduation rate and large classes.) I will use our 2-variable regression to calculate predictions (expectations) for each of the 48 schools and then identify the school with actual giving rate most below the prediction. This is the same thing as finding the school with the most negative residual. 25 ERRORS or RESIDUALS 20 15 10 5 0 -5 0 10 20 30 40 50 -10 -15 PREDICTED VALUES In the scatter plot of errors versus predicted, the circled point is the one with the most negative error. It is school 35 (U. of Michigan-Ann Arbor) for which the regression prediction was 24.9 but the actual giving rate was 13….a full 11.9 points below expectation. I will leave it to you Notre Dame readers to draw your own conclusions. (You can also identify the most negative residual by asking EXCEL to give you the residuals.....and either eyeball or sort.) 6. Bo notices that some of the 48 have “university” in their names, some have “college” and the rest have “institute”. Bo wonders whether these names are predictive of student/faculty ratio? (Formulate and test a relevant hypothesis.) [25 points] Let us use H0: mean S/F ratio is equal for the three names. Ha will be not all equal. We can use either ANOVA single factor or regression with 2 dummies to test this hypothesis. SUMMARY OUTPUT Regression Statistics Multiple R 0.306267658 R Square 0.093799878 Adjusted R Square 0.053524317 Standard Error 4.719185001 Observations 48 ANOVA df Regression Residual Total Intercept Dcollege Dinstitute 2 45 47 SS 103.7348 1002.1818 1105.9167 Coefficients 11.8636 -0.3636 -7.3636 Standard Error 0.7114 3.4120 3.4120 MS 51.8674 22.2707 F 2.3290 t Stat 16.6754 -0.1066 -2.1582 P-value 0.0000 0.9156 0.0363 Significance F 0.1090 Although the mean giving S/F ratio for institutes is significantly lower than for Universities (the group not included in the model) because the p-value is 0.036, overall we CAN NOT reject H0 ( the p-value for our H0 is 0.1090). The differences in three sample means are not statistically significant. Part of the reason is that there are only 2 colleges and 2 institutes…which makes our estimates of their means highly uncertain---a fact accounted for in our p-value.