Chapter 22 Comparing 2 Proportions © 2006 W.H. Freeman and Company Objectives (Chapter 22) Comparing two proportions (comparing ABILITY in two different contexts) • Comparing two independent samples • Large-sample CI for two proportions • Test of statistical significance Point Estimator: pˆ 1 pˆ 2 • Two random samples are selected from two populations. • The number of successes in each sample is recorded. • The sample proportions are computed. Sample 1 Sample size n1 Number of successes x1 Sample proportion pˆ 1 = x1 n1 Sample 2 Sample size n2 Number of successes x2 Sample proportion pˆ 2 = x2 n2 3 Comparing two independent samples We often need to estimate the difference p1 – p2 between two unknown population proportions based on independent samples. We can compute the difference between the two sample proportions and compare it to the corresponding, approximately normal sampling distribution model for pˆ 1 pˆ 2 Large-sample CI for two proportions For two independent samples of sizes n1 and n2 with sample proportion of successes pˆ1 and pˆ 2 respectively, an approximate level C confidence interval for p1 – p2 is ( pˆ 1 pˆ 2 ) z * pˆ 1 (1 pˆ 1 ) n1 pˆ 2 (1 pˆ 2 ) n2 w here z* is the appropriate value from the z-table that depends on the confidence level C C is the area under the standard normal curve between −z* and z*. Use this method when npˆ 1 10, n (1 pˆ 1 ) 10, npˆ 2 10, n (1 pˆ 2 ) 10 Icing the Kicker: 95% Confidence Interval Football coaches often employ the “icing the kicker” strategy. To ice the kicker the opposing coach calls for a timeout just before the kicker attempts a field goal, hoping that the delay interrupts the kicker’s concentration and causes him to miss the kick. Standard error of the difference p1− p2: SE = SE = pˆ 1 (1 pˆ 1 ) n1 0 . 797 ( 0 . 203 ) pˆ 2 (1 pˆ 2 ) n2 0 . 773 ( 0 . 227 ) = 0 . 0344 pˆ Made FG n Timeout (icing) p1 157 197 79.7% No Timeout p2 377 488 77.3% The confidence 197 488 95% CI (0.797 0.773) 1.96(0.0343) interval is ( pˆ 1 pˆ 2 ) z * SE = 0.024 .0672 So the 95% CI is 0.024 ± 0.0672 = (0.0432, 0.0912) We are 95% confident that the interval 4.32% to 9.12% captures the true difference in the ABILITY of kickers to make a field goal when iced and their ABILITY to make a field goal when not iced. Because 0 is in the interval, we do not have convincing evidence that there is a significant difference in the ABILITY of kickers to make field goals when iced and when not iced. Example: 95% confidence interval for p1 – p2 The age at which a woman gives birth to her first child may be an important factor in the risk of later developing breast cancer. An international study conducted by WHO selected women with at least one birth and recorded if they had breast cancer or not and whether they had their first child before their 30th birthday or after. Cancer Sample Size Age at 683 First Birth > 30 3220 Age at 1498 First Birth <= 30 10,245 The parameter to be estimated is p1 – p2. p1 = cancer rate when age at 1st birth >30 p2 = cancer rate when age at 1st birth <=30 21.2% pˆ 1 ( pˆ 1 pˆ 2 ) 1 . 96 pˆ 1 (1 pˆ 1 ) pˆ 2 (1 pˆ 2 ) n1 n2 14.6% pˆ 2 We estimate that the cancer rate when age at first birth > 30 is between .05 and .082 higher than when age <= 30. (.212 .146 ) 1.96 .212 (.788) 3220 .146 (.854 ) 10, 245 .066 1.96 (.008) or .066 .016 (.05, .082 ) 7 Beware!! Common Mistake !!! A common mistake is to calculate a one-sample confidence interval for p1, a one-sample confidence interval for p2,and to then conclude that p1 and p2 are equal if the confidence intervals overlap. This is WRONG because the variability in the sampling distribution for pˆ 1 pˆ 2 from two independent samples is more complex and must take into account variability coming from both samples. Hence the more complex formula for the standard error. SE = pˆ 1 (1 pˆ 1 ) n1 pˆ 2 (1 pˆ 2 ) n2 INCORRECT Two single-sample 95% confidence intervals: The confidence interval for the rightie BA and the confidence interval for the leftie BA overlap, suggesting no significant difference between Ryan Howard’s ABILITY to hit righthanded pitchers and his ABILITY to hit left-handed pitchers. Hits AB Rightie interval: (0.274, 0.366) phat(BA) Rightie 126 394 .320 Leftie 50 222 .225 Leftie interval: (0.170, 0.280) C O R R E C T T he 2-sam ple 95% confidence interval of the form ( p R p L ) 1.96 p R (1 p R ) nR p L (1 p L ) nL for the difference p R p L betw een the A B IL IT IE S is (.023, .167). Interval is entirely positive, su ggestin g sign ifican t d ifferen ce betw een H ow ard's A B IL IT IE S to hit righties and lef ties (evidence that p R is larger than p L ). 0 .023 .095 .167 Reason for Contradictory Result It's alw ays true that a b pˆ 1 (1 pˆ 1 ) n1 a b . S pecifically, pˆ 2 (1 pˆ 2 ) pˆ 1 (1 pˆ 1 ) n2 n1 pˆ 2 (1 pˆ 2 ) n2 SE ( pˆ 1 pˆ 2 ) SE ( pˆ 1 ) SE ( pˆ 2 ) 10 Hypothesis Tests for p1 p2 If the null hypothesis is true, then we can rely on the properties of the sampling distribution of pˆ1 pˆ 2 to estimate the probability of selecting 2 samples with proportions pˆ1 and pˆ 2 S am pling distribution of pˆ 1 pˆ 2 H 0 : p1 p 2 = 0 Ha ( th at is, p1 = p 2 = p ) w hen H 0 : p1 p 2 = 0 is true. 0 : p1 p 2 0 0 1 1 p (1 p ) n2 n2 O u r b est estim ate o f p is pˆ , th e p o o led sam p le p ro p o rtio n pˆ = to tal su ccesses = to tal o b servatio n s z = co u n t 1 co u n t 2 n1 n 2 pˆ 1 pˆ 2 1 1 pˆ (1 pˆ ) n2 n2 This test is appropriate when npˆ 1 10, n (1 pˆ 1 ) 10, npˆ 2 10, n (1 pˆ 2 ) 10 =0 Do NFL Teams With Domes Have an Unfair Advantage? Do NFL teams that play their home games in a dome have an advantage over teams that do not play their home games in a dome? Home Home Wins Games Home field is dome 50 Home field is not dome 103 79 The parameter to be estimated is p1 – p2. p1 = home win rate for dome teams p2 = home win rate for non-dome teams .633 pˆ 1 187 .551 pˆ 2 Do not reject H0 :p1 – p2 = 0. There is no significant difference between the home win rate for dome teams and the home win rate for non-dome teams. 50 103 : p p = 0 pˆ = = . 575 0 1 2 79 187 H :p p 0 A 1 2 H . 575 (1 . 575 ) . 575 (1 . 575 ) = . 066 79 187 (. 633 . 551 ) 0 :z = = 1 . 24 . 066 SE ( pˆ pˆ ) = 1 2 Test Statistic P va lu e : P ( z 1 .2 4 ) = .1 0 7 5 12 Gastric Freezing Gastric freezing was once a treatment for ulcers. Patients would swallow a deflated balloon with tubes, and a cold liquid would be pumped for an hour to cool the stomach and reduce acid production, thus relieving ulcer pain. The treatment was shown to be safe, significantly reducing ulcer pain, and was widely used for years. A randomized comparative experiment later compared the outcome of gastric freezing with that of a placebo: 28 of the 82 patients subjected to gastric freezing improved, while 30 of the 78 in the control group improved. H0: pgf - pplacebo = 0 Ha: pgf - pplacebo > 0 pˆ gf = 28 pgf = proportion that receive relief from gastric freezing pplacebo = proportion that receive relief using a placebo = .3 4 1 P value = P ( z 0.499) = .69 82 pˆ placebo = pˆ pooled = 30 = .3 8 5 78 28 30 82 78 = 0 .3 6 2 5 z= pˆ gf pˆ placebo 1 1 pˆ (1 pˆ ) n1 n 2 = 0.341 0.385 1 1 0.363 * 0.637 82 78 = 0.044 = 0.499 0.231 * 0.025 Conclusion: The gastric freezing was no better than a placebo (P-value 0.69), and this treatment was abandoned. ALWAYS USE A CONTROL!