Chapter 22—“Inference for Two Proportions.” Inference for the difference between two proportions. The approach, logic, and interpretations are the same. Only the standard deviation changes and it’s not hard to find that for the difference of independent random variables, variances add. That allows us to create a confidence interval for the difference of two proportions. Add one more idea—pooling samples to better estimate the population proportion—and we can test a hypothesis about the two proportions. TI Tips lead students through the calculator procedures for two proportion inference. Comments This is the chapter where you should begin to see the benefits of the approach we have taken to inference. All the pieces are in place. We’ve been working with proportions for three chapters now. Now we can create and interpret a confidence interval and write their conclusions. It’s quite logical to turn from looking at one proportion to comparing two proportions. Toss in adding variances to describe the variability in the difference between independent random variables, and these pieces go together easily. Students see they can handle inference for two proportions quite well. You’ll find this chapter is as much about reviewing and solidifying what students already know about inference as it is about learning two-sample procedures. Looking Ahead Our first test of student’s understanding of inference follows this chapter, and then we move on to look at inference for means. The fundamental concepts about confidence intervals and hypothesis tests are the same – if you’ve seen one confidence interval or hypothesis test you’ve seen ‘em all. With a firm grasp of the ideas, logic, and meaning of inference, students should readily embrace inference for one mean in Chapter 23, then compare two means in Chapter 24, and analyze matched pairs in Chapter 25. Using proper notation, always important, becomes even more vital now that there are two groups. Insist that students clearly identify what their notation means. We need to use subscripts to distinguish between the two groups, something obvious, like “M” and “F” for male and female. Hypotheses. We wonder if the true proportions are the same in the two populations from which our samples are drawn. As always, a null hypothesis means nothing unusual is afoot—in other words, the two proportions are really the same and any difference that showed up in the samples can be explained by sampling error. you will probably propose that you make the null hypothesis about equal proportions ( p1 = p2 ). There’s nothing wrong with that, and you should write it down. But also the alternative way of saying “no difference”: p 1 - p2 = 0 . The fact that we are talking about the difference of two proportions will be important when we need to find the standard deviation of the sampling model for pˆ − pˆ . Model. You should list the conditions. They need to check the usual conditions for inference about proportions for both groups: randomness, less than 10% of the respective populations, and enough successes and failures. There’s another condition, but don’t write that yet. When the need to add variances arises, we’ll see that the two groups must be independent. You can return to the conditions and add that assumption to the list. At that point, students will understand why it’s important. Mechanics, part I. Let the students proceed as they usually would: writing down the observed statistics, drawing the Normal curve, shading the region representing the P-value, and starting to find z. The numerator is easy: the observed difference in sample proportions minus the hypothesized difference of 0. The denominator presents the first place they’ll hesitate. The problem is in knowing what standard deviation to use. (As an aside, you can point out that standard deviations are always the problem. Once we know how to find the appropriate standard deviation, inference is usually pretty straightforward.) Let them stumble around a few minutes right here. One of your students is likely to remember the “variances add” mantra. Then you can derive (or show) the formula you need in the denominator and add the independence requirement to the list of conditions. Mechanics, part II. The formula for standard deviation of the difference of two independent proportions requires that we know both population proportions, p1 and p2. We don’t. In the earlier one-proportion case, we had a hypothesized value to use, but not now. Students will suggest that we simply find a standard error by substituting our best sample-based estimates, p1 and p2. Great idea – this shows they have caught on to our basic tactics. But we can make a slight conceptual improvement here. The null hypothesis is that there is no difference in the two population proportions. In other words, we are currently operating under the assumption that p1 and p2 . This means that there is one common population proportion (call it p), and that is the value we should be substituting for both p1 and p2. (It makes some sense to use two different estimates for the same value, and substitute different numbers into the formula where we have hypothesized the values are the same, but it would be better to use one common value.) We don’t know this magical value of p. We need to come up with the best possible estimate. Chances are someone will suggest using the total number of successes and total number of trials in both samples lumped together. That’s called “pooling” and is the right approach when we are testing the hypothesis that two proportions are the same. Conclusion. We know what the P-value means and how to link it to their decision. They will be able to write a good conclusion in the proper context. And now the confidence interval . . . We just decided that the difference was significant. So how big might it be? We need a confidence interval. We have already checked the conditions. The interval is (as always) an estimate plus or minus a margin of error. Here, that’s the observed difference in proportions plus or minus 1.96 times the standard error of the difference. We already have the formula for that standard error, but now we no longer believe the two proportions are the same. Now there’s no justification for pooling. Because we believe proportions 1 p and 2 p could be different, we should use the two different estimates ˆp 1and ˆp2 to find the standard error. They’ll need to think carefully to write a clear statement that correctly interprets that confidence interval. Continue to clarify the pooling issue. A short explanation should make sense. When finding a confidence interval for the difference there is an implied assumption that the two proportions could well be different. So use different estimates – don’t pool. When testing the hypothesis that the two proportions are equal, we pretend they are, so we use the same value – the pooled estimate – for each. This latter point should be seen for what it is – a technical improvement. The difference between the two SDs isn’t great, but when we assume from the null hypothesis that the two proportions are equal, we should use that information. Test your Understanding AP Statistics Quiz Chapter 22 Great Britain has a great literary tradition that spans centuries. One might assume, then, that Britons read more than citizens of other countries. Some Canadians, however, feel that a higher percentage of Canadians than Britons read. A recent Gallup Poll reported that 86% of 1004 randomly sampled Canadians read at least one book in the past year, compared to 81% of 1009 randomly sampled Britons. Do these results confirm a higher reading rate in Canada? 1. Test an appropriate hypothesis and state your conclusions. 2. Find a 99% confidence interval for the difference in the proportion of Britons and Canadians who read at least one book in the last year. Interpret your interval. We are 99% confident that the proportion of Britons who read at least one book in the past year between 0.8-percentage points and 9.3-percentage points lower than the proportion of Canadians who read at least one book in the past year.