Math 251, Practice Questions on Topics Since 3rd Test (Linear Regression, Goodness of Fit, Analysis of Variance) “Quiz 14 Answers” 1. (From p. 594 #12) Let x be the age of a licensed driver in years. Let y be the percentage of all fatal accidents (for a given age) due to failure to yield the right of way. For example, the first data pair says that 5% of all fatal accidents of 37-year olds are due to failure to yield the right of way. The Wall Street Journal reported the following data: x y 37 5 47 8 Note: x = 372 57 10 67 16 y = 112 77 30 87 43 x2 = 24,814 y2 = 3194 xy = 8254 (a) Draw a scatter plot and find the equation of the least squares line for this data. (b) Use your line from (a) to predict the percentage of fatal accidents due to failure to yield right of way for 40-year olds. (c) Use the regression line in (a) to predict the age for which the percentage of fatal accidents due to failure to yield is 20%. (d) Do the data appear to be positively or negatively correlated? Explain. (e) Compute the correlation coefficient, does its sign agree with your answer in (d)? Does it suggest that there is a good linear fit? (f) Compute the coefficient of determination, and interpret what it means. Answer. A scatter plot of data with predicted values of line of best fit is included below: Plot of Data & Predicted Values 50 45 40 35 30 y 20 Predi cte d y y 25 15 10 5 0 -5 0 20 40 60 x 80 100 The rest of the answers to question 1 may be found in a PDF file that better displays math formulas at the link: http://www.cs.lasierra.edu/~jonv/classes/m251a03/quiz14ans.pdf 2. (From p. 655#3) The type of raw material used to construct stone tools found at the archaeological site Casa del Rito is shown below. A random sample of 1486 stone tools was obtained from a current excavation site. Raw Material Basalt Obsidian Welded tuff Pedernal chert Other Regional Percent of Stone Tools 61.3% 10.6% 11.4% 13.1% 3.6% Observed Number of Tools at Current Excavation Site 906 162 168 197 53 (a) Use a 1% level of significance to test the claim that the regional distribution of raw materials fits the current excavation site. Answer. We use a Chi-Square test (Section 11.2) with: Null Hypothesis: The distribution is as claimed. Alternative Hypothesis: The distribution is different from what is claimed. At the 1% level of significance, the critical region (from Table 7, p. A29) is 2 13.28. We now complete the following table to compute the sample test statistic. Raw Material Basalt Obsidian Welded tuff Pedernal chert Other Observed 906 162 168 197 53 Expected 910.918 157.516 169.404 194.666 53.496 (O – E)2/E .026552 .127646 .011636 .027984 .004599 The observed value 2 = (O-E) 2/E = sum of the right column values = 0.1984, because this does not fall in the critical region, we fail to reject the null hypothesis. (b) Repeat (a) using a 5% level of significance. Answer. The only difference is that the critical region is now 2 9.49. The observed value of 0.1984 still does not fall in the critical region, so we fail to reject the null hypothesis. 3. (From p. 694 #9) A sociologist studying New York City ethnic groups wishes to determine if there is a difference in income for immigrants from four different countries during there first year in the city. She obtained the data in the following table from a random sample of immigrants from these countries (incomes in thousands of dollars). Use a 0.05 level of significance to test the claim that there is no difference in the earnings of immigrants from the four different countries. Country I 12.7 9.2 10.9 8.9 16.4 Country II 8.3 17.2 19.1 10.3 Country III 20.3 16.6 22.7 25.2 19.9 Country IV 17.2 8.8 14.7 21.3 19.8 (a) State the null and alternative hypotheses. Answer: The null hypothesis is: 1 = 2 = 3 =4 The alternative hypothesis is: at least two of the means 1 ,2 ,3 ,4 are different. (b) What assumptions should be made on the populations in order to conduct the test? Answer: The populations should be normal (or approximately normal) with equal variances. (c) State the critical region for the test. Answer: First observe that N = 19, and k=4. Thus the numerator has k-1= 3 degrees of freedom, and the denominator has N-k = 15 degrees of freedom. Using Table 8, on p. A32 we find that the critical region is F 3.29. (d) Given that MSBET = 79.408 and MSW = 17.223 for the given data, report the conclusion of the hypothesis test. Answer: Using the formula for F, we find that F = 79.408 17.223 = 4.61. Thus F is in the critical region, so we reject the null hypothesis, that is, there is evidence that the population means are not all the same. (e) Repeat the test at the 1% level of significance. Answer: For this, the critical region is F 5.42, and so the observed value of F = 4.61 does not fall in the critical region, so we would not reject the null hypothesis at the 1% level of significance.