LAST NAME (Please Print): KEY FIRST NAME (Please Print): HONOR PLEDGE (Please Sign): Statistics 111 Midterm 3 • This is a closed book exam. • You may use your calculator and a single page of notes. • The room is crowded. Please be careful to look only at your own exam. Try to sit one seat apart; the proctors may ask you to randomize your seating a bit. • Report all numerical answers to at least two correct decimal places or (when appropriate) write them as a fraction. • All question parts count for 1 point. 1 1. For different kinds of murderers, you observe their favorite flavors of ice cream. Your data are as follows: parricide infanticide regicide chocolate 10 20 30 vanilla 10 20 0 strawberry 10 20 0 In words specific to the problem, what is the appropriate null hypothesis? There is no relationship between type of murderer and preferred flavor of ice cream. 40 What is the value of your test statistic? The expected values are found from the row sum times the column sum divided by the total, and are shown in the following table parricide infanticide regicide Total chocolate 10 20 30 60 vanilla 10 20 0 30 strawberry 10 20 0 30 Total 30 60 30 120 Then one uses usual formula: ts = X all cells (Oij − Eij )2 Eij 9.49 What is the critical value for a 0.05 level test? It comes from the chi2 table with df equal to the (number of rows - 1)×(number of columns - 1). < 0.001 What is your significance probability (if necessary, give bracketing values). It is below the smallest value in the table. 2 In words specific to the problem, what is the conclusion for a 0.05 level test? Strongly reject the null—there is evidence that ice cream preference is associated with murder style. 2. You suspect that administration of electric shocks encourages telepathy. To test this, you have undergraduates attempt to guess the cards in a Rhine deck, administering small shocks when they guess wrong. (This is the experiment Bill Murray was conducting at the beginning of Ghostbusters.) Under random chance, a person would expect to guess 5 cards correctly, with a standard deviation of 2. 6 Suppose that electric shocks confer telepathy, and people who feel the pain stimulus guess, on average 7.5 cards correctly. How many undergraduates do you need to shock in order for a 0.05 level test to have power 0.9? 0.9 = IP[ts > 1.645] X̄ − 7.5 + 7.5 − 5 √ > 1.645] = IP[ 2/ n 2.5 = IP[Z > 1.645 − √ 2/ n so −1.28 = 1.645 − nearest integer. 2.5 √ 2/ n and solving gives n = 5.476 and one must round up to the 0.92 You shock 9 students. What is the power of a 0.01 level test? power = IP[ts > 2.33] X̄ − 7.5 + 7.5 − 5 √ = IP[ > 2.33] 2/ n 2.5 = IP[Z > 2.33 − √ ] 2/ 9 = IP[Z > −1.42]. 3 3. A Fox News reporter claims that at least 10% more women than men vote for the more handsome candidate. You want to prove him wrong. You draw a random sample of 100 men and 150 women, and ask them whether they would vote for Orlando Bloom if he ran against Newt Gingrich. (Assume Bloom is more handsome than Gingrich.) You find that 80 men would vote for Bloom, and so would 125 women. In symbols, what is the null hypothesis? H0 : pw − pm ≥ 0.1 Note to TA: Accept any mathematically equivalent statement of the null hypothesis. -1.39 What is the value of your test statistic? q ts = (p̂w − p̂m − 0.1)/ p̂w (1 − p̂w /nw + p̂m (1 − p̂m /nm q = (0.83 − 0.8 − 0.1)/ 0.83 ∗ 0.17/150 + 0.8 ∗ 0.2/100 = −1.38 -1.64, -1.65 What is your critical value when α = 0.05? 0.08 What is your significance probability? (If necessary, give a bracket.) In words pertinent to the problem, what conclusion do you reach (at the 0.05 level)? Sadly, we cannot conclude that the Fox reporter is wrong. (To avoid any perception of sexism, let me assure everyone that in the other version of the exam, the null hypothesis is rejected.) 4. Spin magazine published a formula for celebrity perversity (P ), as measured by public opinion polls, based on 25 famous people. The explanatory variables were the age difference between them and their primary partner (D), the number of partners in a year (N ), the number of arrests (A), and whether they were homo/bi/heterosexual (H), coded as 1, 2, 3. Suppose Spin’s regression equation was P = −7 + 4D 2 + 5N + 5A − 2H 4 where the standard errors on the coefficients are 0.5, 0.4, 1.2, 0.3, and 1.3, respectively. 4711 What is the predicted perversity score for Rolling Stones bassist Bill Wyman, who, at 52, married 18 year-old Mandy Smith, claims to have had approximately 20 partners per year, has not been arrested, and is heterosexual? −7 + 4(52 − 18)2 + 5(20) + 5(0) − 2(3) = 4711. H Which explanatory variables should not be included in the model? Use α = 0.05. The critical value is a t20 , and for α = 0.05 this is 1.725. The test statistics for the coefficient on the D 2 is 4/0.4 = 10; for the coefficient on N it is 1.5/1.2 = 1.25; for the coefficient on A it is 5/0.3 = 16.67, and on H it is 2/1.3 = 1.54. The only value less than 1.725 is H. 5. Mendel crosses pea plants that are (YgUw) with themselves. (Here Y indicates dominant yellow peas, g indicates recessive green peas, U indicates dominant unwrinkled peas, and w indicates recessive wrinkled peas.) He obtains 160 offspring, and observes 15 green wrinkled plants, 28 yellow wrinkled plants, 28 green unwrinkled plants, and the rest are yellow and unwrinkled. (These traits are independent.) In words pertinent to the problem, what is your null hypothesis? Mendelian inheritance holds: the proportions should be 9/16, 3/16, 3/16, 1/16 for unwrinkled yellows, unwrinkled greens, wrinkled yellows, and wrinkled greens, respectively. 2.78 What is the value of your test statistic? Out of 160 crosses, one expects 10, 30, 30, and 90 in the obvious four categories. The corresponding counts are 15, 28, 28, 89. The test statistic is X all categories (Oi − Ei )2 = 2.777. Ei 5 7.81 What is the critical value for a 0.05 level test? This is from the χ23 table; the df is the number of categories - 1. > 0.25 What is the significance probability for the experiment (if necessary, give bracketing values.) From the table, the sig. prob. is bigger than 0.25. 6. I want to argue that my 2013 Focus class is smarter than the average Duke student. Suppose the average Duke IQ is 120, and a random sample of 10 Focus students out of 17 had a mean IQ of 125 with a sample sd of 10. In symbols, what is your alternative hypothesis? HA : µF > 120 2.39 What is the value of your test statistic? The trick here is to use the FPCF. The test statistic is ts = X̄ − µ0 √ p = 2.39. (10/ 10) ∗ (17 − 10)/(17 − 1) 1.83 For a 0.05 level test, what is your critical value? This comes from a t9 table. 0.025 to 0.02 What is my significance probabilty (if necessary, give a bracket). Yes Do I decide my Focus class is smarter? 7. You want to argue that Duke students are smarter than students at UNC. To reduce variance, you control for major. You observe: 6 math history English economics statistics Duke 130 110 115 120 150 UNC 125 106 110 114 145 In words pertinent to the problem, what is your alternative hypothesis? The average IQ at Duke is larger than the average IQ at UNC. TAs: Accept any equivalent wording. 15.81 What is the value of your test statistic? This is a paired difference test. The differences are 5, 4, 5, 6, and 5, so the standard √ deviation of the differences is sdD = 0.5. The mean for Duke is 125 and the mean for UNC is 120. So the test statistic is 125 − 120 = 15.81. ts = p 0.5/5 2.13 For a 0.05 level test, what is your critical value? It comes from a t4 table. In words pertinent to the problem, what conclusion do you reach (at the 0.05 level)? You reject the null; Duke is smarter. < 0.0005 What is your significance probability (if necessary, give bracketing values). 8. You use linear regression to predict annual income (in thousands) from the number of pets someone owns. The fitted regression equation is Y = 80 − 3X. The proportion 7 of variance explained by knowing the number of pets is 0.36, the standard deviation of the residuals is 12, and fit is based on 100 randomly chosen people. -0.6 What is the correlation coefficient? This is the square root of the coefficient of determination, with sign matched to the slope. $50K What is your predicted income for Ace Ventura, who owns ten pets? $50K, since 50 = 80 - 3*10. $34.64K About 90% of the people who own 10 pets will have at least what income? Previously, I took a shortcut in teaching people how to place a confidence interval on a regression line and a regression prediction. But this semester I decided to teach the real formulae, which appear on page 14 of lecture 17. For the Fall 2014 exam, be prepared to use those formulae. Under the regression assumptions, people with 10 pets have incomes that are normally distributed with mean 50 and standard deviation 12. From the z-table, 90% of the area is above -1.28, so the answer is 50 + (12)(-1.28) = $34.64K. 9. Write a nonlinear equation to predict the amount of lumber Y that can be harvested from a tree whose diameter at 5 feet is X and whose height is W . Since trees are cylinders, the usable volume has the form Y = β0 + β1 X 2 W . 11. Your brother has two coins. One is fair, and the other has 2/3 chance of coming up heads. He wants to determine who washes the dishes with a coin toss, and assures you he is using a fair coin. To test this, you toss the coin five times, and get four heads. 0.19 What is your exact significance probability? 8 Significance probability is the chance of getting results that support the alternative as or more strongly than the data seen, when the null is true. So compute the chance of getting 4 or 5 heads with a fair coin. This is 0.1875. 0.42 You are a Bayesian, and think there is a 60% chance that your brother is telling the truth. After you make the five tosses, what do you think is the chance that he is truthful? Bayes rule. Calculate IP[fair|4] = IP[4|fair] ∗ 0.6 IP[4|fair] ∗ 0.6 + IP[4|2/3 coin] ∗ 0.4 using the binomial formula. 10. List all, and only, the true statements (10 pts.) D, F, I, J A. As points cluster more tightly around a line, the correlation increases. B. An ecological correlation occurs when there is a nonlinear relationship. C. Galton first proposed euphonics. D. Including irrelevant explanatory variables reduces predictive accuracy. E. Correlation implies causation. F. In regression, the errors are assumed to be independent and normal. G. A frequentist can tell you the probability that the null hypothesis is correct. H. A significance probability is the chance of observing data that are as or more supportive of the null than the data obtained, when the null hypothesis is true. I. A P-value is the same thing as a significance probability. J. Descartes was an artillery officer. 9