Chapter 22 Comparing 2 Proportions © 2006 W.H. Freeman and Company Objectives (Chapter 22) Comparing two proportions Comparing two independent samples Large-sample CI for two proportions Test of statistical significance Comparing two independent samples We often need to estimate the difference p1 – p2 between two unknown population proportions based on independent samples. We can compute the difference between the two sample proportions and compare it to the corresponding, approximately normal sampling distribution for pˆ1 pˆ 2 Point Estimator of p1 – p2 : pˆ 1 pˆ 2 Two random samples are drawn from two populations. The number of successes in each sample is recorded. The sample proportions are computed. Sample 1 Sample size n1 Number of successes x1 Sample proportion pˆ 1 = x1 n1 Sample 2 Sample size n2 Number of successes x2 Sample proportion pˆ 2 = 4 x2 n2 Large-sample CI for two proportions For two independent SRSs of sizes n1 and n2 with sample proportion of successes p ˆ1 and pˆ2 respectively, an approximate level C confidence interval for p1 – p2 is pˆ1 (1 pˆ1 ) pˆ 2 (1 pˆ 2 ) n1 n2 where z* is the appropriate value from the z-table that depends on the confidence level C ( pˆ1 pˆ 2 ) z* C is the area under the standard normal curve between −z* and z*. Use this method when npˆ1 10, n(1 pˆ1 ) 10, npˆ 2 10, n(1 pˆ 2 ) 10 Cholesterol and heart attacks How much does the cholesterol-lowering drug Gemfibrozil help reduce the risk of heart attack? We compare the incidence of heart attack over a 5-year period for two random samples of middle-aged men taking either the drug or a placebo. Standard error of the difference p1− p2: pˆ H. attack n Drug p2 56 2051 2.73% Placebo p1 84 2030 4.14% SE = pˆ1(1 pˆ1) pˆ 2 (1 pˆ 2 ) n1 n2 SE = 0.0273(0.9727) 0.0414(0.9586) = 0.00764 2051 2030 T heconfidenceintervalis ( pˆ1 pˆ 2 ) z * SE So the 90% CI is (0.0414 − 0.0273) ± 1.645*0.00746 = 0.0141 ± 0.0125 = (0.016, 0.0266) We are 90% confident that the interval 0.16% to 2.66% captures the true percentage difference in heart attack rates for middle-aged men when taking a placebo and the cholesterol-lowering drug. Example: 95% confidence interval for p1 – p2 The age at which a woman gives birth to her first child may be an important factor in the risk of later developing breast cancer. An international study conducted by WHO selected women with at least one birth and recorded if they had breast cancer or not and whether they had their first child before their 30th birthday or after. Cancer Sample Size Age at 683 First Birth > 30 3220 Age at 1498 First Birth <= 30 10,245 The parameter to be estimated is p1 – p2. p1 = cancer rate when age at 1st birth >30 p2 = cancer rate when age at 1st birth <=30 21.2% pˆ1 pˆ1 (1 pˆ1 ) pˆ 2 (1 pˆ 2 ) ( pˆ1 pˆ 2 ) 1.96 n1 n2 14.6% pˆ 2 We estimate that the cancer rate when age at first birth > 30 is between .05 and .082 higher than when age <= 30. (.212 .146) 1.96 .212(.788) 3220 .146(.854) 10, 245 .066 1.96(.008) or .066 .016 (.05,.082) 7 HypothesisTests for p1 – p2 If the null hypothesis is true, then we can rely on the properties of the sampling distribution of pˆ1 pˆ 2 to estimate the probability of selecting 2 samples with proportions pˆ1 and pˆ 2 Sampling distribution of pˆ pˆ H 0 : p1 p2 = 0 (that is, p1 = p2 = p ) 0 H a : p1 p2 0 0 Our best estimate of p is pˆ , 1 2 when H 0 : p1 p2 = 0 is true. the pooled sample proportion count1 count 2 total successes pˆ = = total observations n1 n2 pˆ1 pˆ 2 z= 1 1 pˆ (1 pˆ ) n2 n2 This test is appropriate when npˆ1 10, n(1 pˆ1 ) 10, npˆ 2 10, n(1 pˆ 2 ) 10 1 1 p (1 p ) n2 n2 =0 Gastric Freezing Gastric freezing was once a treatment for ulcers. Patients would swallow a deflated balloon with tubes, and a cold liquid would be pumped for an hour to cool the stomach and reduce acid production, thus relieving ulcer pain. The treatment was shown to be safe, significantly reducing ulcer pain, and was widely used for years. A randomized comparative experiment later compared the outcome of gastric freezing with that of a placebo: 28 of the 82 patients subjected to gastric freezing improved, while 30 of the 78 in the control group improved. H0: pgf - pplacebo = 0 pgf = proportion that receive relief from gastric freezing Ha: pgf - pplacebo > 0 pplacebo = proportion that receive relief using a placebo 28 = .341 82 30 pˆ placebo = = .385 78 28 30 pˆ pooled = = 0.3625 82 78 pˆ gf = P value = P ( z 0.499) = .69 z= pˆ gf pˆ placebo 1 1 pˆ (1 pˆ ) n1 n2 = 0.341 0.385 1 1 0.363*0.637 82 78 = 0.044 0.231*0.025 = 0.499 Conclusion: The gastric freezing was no better than a placebo (P-value 0.69), and this treatment was abandoned. ALWAYS USE A CONTROL!