SOLUTIONS TO FINAL EXAM VERSION 2 1) A) The signs are all positive. This makes sense, as we would hope that the (estimated) expected salary would increase with years of employment, number of prior years' experience and years of education. B) The coefficient for gender is –2929, indicating that the estimated expected salary for males is almost 3000 Dollars less than for females having comparable years of employment, prior years' experience and education. Note that the coefficient is negative, indicating that as gender goes from 0 (female) to 1 (male) while holding all other variables fixed, the estimated response surface decreases. C) Since the p-value for gender is .175/2=.088, which is not very small, we do not have strong evidence to conclude that there is discrimination on the basis of gender. (We cannot reject the null hypothesis that the true coefficient for gender is zero). D) No. According to the model, salaries are random, so we cannot determine the exact impact of a one year increase in prior experience, but the estimated expected increase in salary is $238.40. E) Since the p-value corresponding to the F-statistic is .016, we can conclude (at the 1.6% significance level) that the model is reasonable in the sense that not all explanatory variables have a zero coefficient. The R2 seems reasonably high, at 88%, but this may not be as impressive as it seems, since we are using a total of five regression parameters to explain 10 data points. Furthermore, the only explanatory variable (besides the intercept) which has a statistically significant coefficient (at level .05 or less) is years of employment. It might be wise to consider deleting some or all of the other variables. 2) A) We can say that 88% of the variability in salary is “explained” by years employment, prior years’ experience, education and gender. B) We have =.05, and df = 5, so from Table 6, we get t / 2 =2.571. The confidence interval is 150.5 (2.571)( 424.8) (941.7 , 1242.7) . This includes zero, so we can’t reject the null hypothesis that the true coefficient of Educ is zero. Such an interval would contain the true coefficient in 95% of all random samples of this kind which could have been taken from the given population. 3) A) From the output, the regression and error degrees of freedom are 4 and 5, respectively. In other words, k = 4 and n–k–1 = 10–4–1 = 5. B) The ratio of MS (Regression) to MS (Error) is 63870378 / 6946683 = 9.19. C) S is the square root of MS (Error), 6946683 2636 . D) The degrees of freedom for the t-statistic is the degrees of freedom for Error, which is 5. E) ŷ = 24778 + (4) (617.4) + 0 + (3) (150.5) + 0 = 27699. 4) A) We have n = 75, x = 25, s = 80. The expected difference between Harry’s and Joanne’s scores is (unknown). Harry’s claim is that > 0. We will test the null hypothesis that = 0 versus the alternative hypothesis that > 0. (This is a one-tailed test). The t-statistic is 25 0 t 2.71. Since this is greater than 2.326, we reject the null 80 / 75 hypothesis at level .01. Yes, there is evidence to support Harry’s claim, at the 1% level of significance. B) The answer is the p-value, which is the area to the right of 2.71 under a standard normal. This is .5 – .4966 = .0034. 5) A) Using the output for group 2, and a null hypothesis that = 4, we get t = (15.72 – 4) / 1.3 = 9.02. Since this exceeds 2.262 (df=9), we reject the null hypothesis. The answer is “Yes”. B) The p-value of 0.0000 was calculated by finding the area under a tdistribution (df=16) which is either to the left of –5.72 or to the right of 5.72. The practical interpretation: If the expected prices for university and commercial publishers were the same, we would get such strong evidence of differences between them less than one time out of 10,000. Thus, there is overwhelming evidence that the mean prices are in fact different. 6) Answer is zero (C) since the residual is the actual y minus the fitted value. 7) False (B). There is nothing random about the alternative hypothesis, so we can’t talk about the probability that it is true. 8) True (A). If the p-value were greater than 5%, then we would be unable to reject the null hypothesis at level .05. 9) Since SST = SSR + SSE and all of these sums of squares are non-negative, SSR must be less than or equal to SST. Answer is B. 10) We reject the null hypothesis is t < – t .01 . We have df = 14, so – t .01 = – 2.624. Answer is A. 11) True (A). 12) n = 15, x = 25, s = 80, = .01, t.005 =2.977. (df=14). The CI is 25 (2.977)(80)/ 15 =25 61.49 = (–36.49, 86.49). Answer is D. 13) AB is the event that the die comes up 1, 2 or 3. The probability is 3/6. Answer is D. 14) Just focus on the second row of the table. The probability is 65/(65+103) = 65/ 168 = .39. Answer is E. 15) Let A=”Paid Cash”, and B=”Purchase was $20 or more”. In 14, we found that P(A|B)=.39. This is different from P(A)=(51+65)/250 = .46, so the events are not independent . They are not mutually exclusive, since A and B can occur simultaneously. Answer is C.