Introduction to the Practice of Statistics Sixth Edition Moore, McCabe Section 5.1 Homework Answers 5.18 Attitudes toward drinking and behavior studies. Some of the methods in this section are approximations rather than exact probability results. We have given rules of thumb for safe use of these approximations. (a) You are interested in attitudes toward drinking among the 75 members of a fraternity. You choose 30 members at random to interview. One question is "Have you had five or more drinks at one time during the last week?" Suppose that in fact 30% of the 75 members would say "Yes." Explain why you cannot safely use the B(30, 0.3) distribution for the count X in your sample who say "Yes." The binomial distribution assumes that we have independence. In this case we do not, and the probabilities change too much for us to disregard the fact that we do not. P(drink) = 0.3, P(drink | drink and drink and not drink and not drink and not drink and not drink) = 0.2857 and this is after only sampling five, by 20, the probability of success will not be close to 0.3. (b) The National AIDS Behavioral Surveys found that 0.2% (that's 0.002 as a decimal fraction) of adult heterosexuals had both received a blood transfusion and had a sexual partner from a group at high risk of AIDS. Suppose that this national proportion holds for your region. Explain why you cannot safely use the Normal approximation for the sample proportion who fall in this group when you interview an SRS of 1000 adults. The criteria is np 10 and n(1 – p) 10. The probability of success is 0.002. 0.002(1000) = 2 is not greater than 10. 5.22 The ideal number of children. "What do you think is the ideal number of children for a family to have?" A Gallup Poll asked this question of 1016 randomly chosen adults. Almost half (49%) thought two children was ideal.3 Suppose that p = 0.49 is exactly true for the population of all adults. Gallup announced a margin of error of ±3 percentage points for this poll. What is the probability that the sample proportion p̂ for an SRS of size n = 1016 falls between 0.46 and 0.52? You see that it is likely, but not certain, that polls like this give results that are correct within their margin of error. We will say more about margins of error in Chapter 6. A similar question was asked in chapter 4, section 3, 4.66. n = 1016, p = 0.49, 1016(0.49) = 497.84 (expected number of successes and 1016(0.51) = 518.16 expected number of failures thus the distribution of this binomial situation resembles that of a normal distribution. Let X count the number of people that think 2 children is ideal. Sample space: {X | 0, 1, 2, …, 1016} Or Sample space { p̂ | 0, 1/1016, 2/1016, …, 1015/1016, 1} The sample spaces consist of 1017 values. 0.46(1016) = 467.36, 0.52(1016) = 528.32 Here is what we want expressed as either a count or proportion: P(0.46 p̂ 0.52) P(467 X 528) You can see that all we can do is get an approximation to the question since you can not have a count of 467.36 for example. P(467 X 528) = P(X 528) – P(X 466) = 0.9728 – 0.02454 = 0.9483 =binomdist(528, 1016, 0.49,true) – binomdist(466,1016,0.49, true) Why did I change from 467 to 466? Because I want to include 467 in the calculation, and since I have a discrete distribution, I need to take away 466, 465, and so on. Normal Approximation - typically, in this scenario posed, most researchers will do a normal approximation and not the procedure for a binomial calculation. Again we meet the criteria np 10 and n(1 – p) 1016(0.49) = 497.84 and 1016(0.51) = 518.16 0.46(1016) = 467.36, 0.52(1016) = 528.32 P(0.46 p̂ 0.52) P(X 528.32) – P(X 467.36) 528.32-1016(0.49) 467.36-1016(0.49) P Z - P Z 1016(0.49)(0.51) 1016(0.49)(0.51) P(Z < 1.91) - P(Z < -1.91) 0.8832 - 0.02801 0.8552 Notice that this value is smaller than the one using the binomial routine. 0.52- (0.49) P(0.46 p̂ 0.52) P Z (0.49)(0.51) 1016 0.46- (0.49) - PZ (0.49)(0.51) 1016 P(Z < 1.91) - P(Z < -1.91) 0.8832 - 0.02801 0.8552 Notice that this value is smaller than the one using the binomial routine. Using a normal approximation - The sample size here is large and p is in the middle of the possible range of p values; [0, 1]. Thus the normal approximation above will be very close to actual. Below are the steps with continuity correction. P(X 528.5) – P(X 466.5) 528.5-1016(0.49) 466.5-1016(0.49) P Z - P Z 1016(0.49)(0.51) 1016(0.49)(0.51) P(Z < 1.924) - P(Z < -1.967) 0.9728 - 0.0246 0.9482 5.24 How do the results depend on the sample size? Return to the Galiup Poll setting of Exercise 5.22. We are supposing that the proportion of all adults p.-ho think that two children is ideal is p = 0.49. What is the probability that a sample proportion p̂ falls between 0.46 and 0.52 (that is, within ±3 percentage points of the true p) if the sample is an SRS of size n = 300? Of size n = 5000? Combine these results with your work in Exercise 5.22 to make a general statement about the effect of larger samples in a sample survey. Size n = 300 Crunch it. P(0.46 p̂ 0.52) = P(0.52(300) X 0.46(300)) = P(X 156) – P(X 138) = binomdist(156, 300, 0.49, true) – binomdist(137, 300, 0.49, true) = 0.8637 – 0.1363 = 0.7275 see answer to problem 5.22 for pictorial representation. Normal Approximation. P(0.46 p̂ 0.52) = P(0.52(300) X 0.46(300)) = P(X 156) – P(X 138) 156-300(0.49) 138-300(0.49) P Z - P Z or if using p-hats 300(0.49)(0.51) 300(0.49)(0.51) 0.52- (0.49) PZ (0.49)(0.51) 300 0.46- (0.49) - PZ (0.49)(0.51) 300 P(Z < 1.039) - P(Z < -1.039) 0.8506 - 0.1494 0.7012 Normal Approximation, continuity correction. P(0.46 p̂ 0.52) = P(0.52(300) X 0.46(300)) = P(X 156) – P(X 138) 156.5-300(0.49) 137.5-300(0.49) P Z - P Z 300(0.49)(0.51) 300(0.49)(0.51) P(Z < 1.097) - P(Z < -1.097) 0.8637 - 0.1363 0.7214 Size n = 5000 Normal Approximation. P(0.46 p̂ 0.52) = P(0.52(5000) X 0.46(5000)) = P(X 2600) – P(X 2300) 2600-5000(0.49) 2300-5000(0.49) P Z - P Z 5000(0.49)(0.51) 5000(0.49)(0.51) P(Z < -4.24) - P(Z < 4.24) 0.999989 - 0.0000118 1 P(0.46 p̂ 0.52) = P(0.46(5000) X 0.52(5000)) = P(X 2600) – P(X 2300) =binomdist(2600, 5000, 0.49,true) – binomdist(2299, 5000, 0.49, true) = 0.999979 5.28 Admitting students to college. A selective college would like to have an entering class of 950 students. Because not all students who are offered admission accept, the college admits more than 950 students. Past experience shows that about 75% of the students admitted will accept. The college decides to admit 1200 students. Assuming that students make their decisions independently, the number who accept has the B(1200, 0.75) distribution. If this number is less than 950, the college will admit students from its waiting list. (a) What are the mean and the standard deviation of the number X of students who accept? Notice that we want the mean and standard deviation of the count: of the number X of students who accept? X = 1200(0.75) = 900 X = 1200(0.75)(0.25) = 15 (b) The college does not want more than 950 students. What is the probability that more than 950 will accept? P(X 951) = 0.00030194 = 1 – binomdist(950, 1200,0.75,true) Normal Approximation. 1200(0.75) = 900 10 and 1200(0.25) = 300 10. P(X 951) P Z > 951-1200(0.75) 1200(0.75)(0.25) P(Z > 3.4) 0.000337 Normal Approximation with continuity correction. 1200(0.75) = 900 10 and 1200(0.25) = 300 10. 950.5-1200(0.75) P(X 951) P Z > 1200(0.75)(0.25) ≈ P(Z > 3.37) ≈ 1 – normsdist(3.37) ≈ 0.000376 (c) If the college decides to increase the number of admission offers to 1300, what is the probability that more than 950 will accept? P(X 951) = 0.940834 = 1 – binomdist(950,1300,0.75,true) Normal Approximation. 1300(0.75) = 975 10 and 1300(0.25) = 325 10. P(X 951) P Z > 951-1300(0.75) 1300(0.75)(0.25) P(Z > -1.5372) 0.9379 Normal Approximation with continuity correction. 1300(0.75) = 975 10 and 1300(0.25) = 325 10. 950.5-1300(0.75) P(X 951) P Z > 1300(0.75)(0.25) P(Z > -1.56926) 0.9417