Chapter 20 Comparing Two Proportions BPS - 5th Ed. Chapter 20 1 Two-Sample Problems The goal of inference is to compare the responses to two treatments or to compare the characteristics of two populations. X We have a separate sample from each treatment or each population. The units are not matched, and the samples can be of differing sizes. X BPS - 5th Ed. Chapter 20 2 Case Study Machine Reliability A study is performed to test the reliability of products produced by two machines. Machine A produced 8 defective parts in a run of 140, while machine B produced 10 defective parts in a run of 200. n Defects Machine A 140 8 Machine B 200 11 This is an example of when to use the two-proportion z procedures. BPS - 5th Ed. Chapter 20 3 Inference about the Difference p1 – p2 Simple Conditions X The difference in the population proportions is estimated by the difference in the sample ˆ 1 − pˆ 2 proportions: p X When both of the samples are large, the sampling distribution of this difference is approximately Normal with mean p1 – p2 and standard deviation p1(1− p1 ) p2 (1− p2 ) + n1 n2 BPS - 5th Ed. Chapter 20 4 Inference about the Difference p1 – p2 Sampling Distribution BPS - 5th Ed. Chapter 20 5 Standard Error Since the population proportions p1 and p2 are unknown, the standard deviation of the difference in sample proportions will need to be estimated by substituting pˆ 1 and pˆ 2 for p1 and p2: SE = BPS - 5th Ed. ˆ 1(1− pˆ 1 ) pˆ 2 (1− pˆ 2 ) p + n1 n2 Chapter 20 6 BPS - 5th Ed. Chapter 20 7 Case Study: Reliability Confidence Interval Compute a 90% confidence interval for the difference in reliabilities (as measured by proportion of defectives) for the two machines. ˆ 1(1− pˆ 1 ) p ˆ 2 (1− p ˆ2) ∗ p ˆ ˆ (p1 − p2 ) ± z + n1 n2 8 ⎛ 8 ⎞ 11 ⎛ 11 ⎞ 1 1 − − ⎜ ⎟ ⎜ ⎟ 8 11 ⎞ 140 140 200 200 ⎛ ⎝ ⎠ ⎝ ⎠ + =⎜ − ⎟ ± 1.645 140 200 ⎝ 140 200 ⎠ = 0.0021 ± 0.0418 = −0.0397 to 0.0439 We are 90% confident that the difference in proportion of defectives for the two machines is between -3.97% and 4.39%. Since 0 is in this interval, it is unlikely that the two machines differ in reliability. BPS - 5th Ed. Chapter 20 8 Adjustment to Confidence Interval “Plus Four” Confidence Interval for p1 – p2 X The standard confidence interval approach yields unstable or erratic inferences. X By adding four imaginary observations (one success and one failure to each sample), the inferences can be stabilized. X This leads to more accurate inference of the difference of the population proportions. BPS - 5th Ed. Chapter 20 9 Adjustment to Confidence Interval “Plus Four” Confidence Interval for p1 – p2 X Add 4 imaginary observations, one success and one failure to each sample. X Compute the “plus four” proportions. ~ = number of successes + 1 p 1 n1 + 2 X number of successes + 1 ~ p2 = n2 + 2 Use the “plus four” proportions in the formula. (~p1 − ~p2 ) ± z∗ BPS - 5th Ed. ~ p2 (1 − ~ p2 ) p1(1 − ~ p1 ) ~ + n2 + 2 n1 + 2 Chapter 20 10 Case Study: Reliability “Plus Four” 90% Confidence Interval 8 +1 9 ~ p1 = = 140 + 2 142 11 + 1 12 ~ p2 = = 200 + 2 202 ~ ~ ~ p2 (1 − ~ p2 ) ∗ p1 (1 − p1 ) ~ ~ (p1 − p2 ) ± z + n1 + 2 n2 + 2 9 ⎛ 9 ⎞ 12 ⎛ 12 ⎞ ⎜1 − ⎟ ⎜1 − ⎟ 9 12 ⎞ 142 142 202 202 ⎛ ⎝ ⎠ ⎝ ⎠ =⎜ − + ⎟ ± 1.645 142 202 ⎝ 142 202 ⎠ = 0.0040 ± 0.0434 = −0.0394 to 0.0474 (This is more accurate.) We are 90% confident that the difference in proportion of defectives for the two machines is between -3.94% and 4.74%. Since 0 is in this interval, it is unlikely that the two machines differ in reliability. BPS - 5th Ed. Chapter 20 11 The Hypotheses for Testing Two Proportions X Null: H0: p1 = p2 One sided alternatives Ha: p1 > p2 Ha: p1 < p2 X Two sided alternative Ha: p1 ≠ p2 X BPS - 5th Ed. Chapter 20 12 Pooled Sample Proportion X X X If H0 is true (p1=p2), then the two proportions are equal to some common value p. Instead of estimating p1 and p2 separately, we will combine or pool the sample information to estimate p. This combined or pooled estimate is called the pooled sample proportion, and we will use it in place of each of the sample proportions in the expression for the standard error SE. total number of successes in both samples p̂ = total number of observatio ns in both samples pooled sample proportion BPS - 5th Ed. Chapter 20 13 Test Statistic for Two Proportions X Use the pooled sample proportion in place of each of the individual sample proportions in the expression for the standard error SE in the test statistic: z= p̂1 − p̂2 p̂1(1 − p̂1 ) p̂2 (1 − p̂2 ) + n1 n2 ⇒ z= ⇒ z= BPS - 5th Ed. Chapter 20 p̂1 − p̂2 ˆ (1 − p ˆ) p ˆ (1 − p ˆ) p + n1 n2 p̂1 − p̂2 ⎛1 ⎞ 1 p̂(1 − p̂ )⎜⎜ + ⎟⎟ ⎝ n1 n2 ⎠ 14 P-value for Testing Two Proportions X Ha: p1 > p2 Y X Ha: p1 < p2 Y X P-value is the probability of getting a value as large or larger than the observed test statistic (z) value. P-value is the probability of getting a value as small or smaller than the observed test statistic (z) value. Ha: p1 ≠ p2 Y P-value is two times the probability of getting a value as large or larger than the absolute value of the observed test statistic (z) value. BPS - 5th Ed. Chapter 20 15 BPS - 5th Ed. Chapter 20 16 Case Study Summer Jobs X X X A university financial aid office polled a simple random sample of undergraduate students to study their summer employment. Not all students were employed the previous summer. Here are the results: Summer Status Men Women Employed 718 593 Not Employed 79 139 Total 797 732 Is there evidence that the proportion of male students who had summer jobs differs from the proportion of female students who had summer jobs. BPS - 5th Ed. Chapter 20 17 Case Study: Summer Jobs The Hypotheses X Null: The proportion of male students who had summer jobs is the same as the proportion of female students who had summer jobs. [H0: p1 = p2] X Alt: The proportion of male students who had summer jobs differs from the proportion of female students who had summer jobs. [Ha: p1 ≠ p2] BPS - 5th Ed. Chapter 20 18 Case Study: Summer Jobs Test Statistic X n1 = 797 and n2 = 732 (both large, so test statistic follows a Normal distribution) Pooled sample proportion: 718 + 593 1311 p̂ = = 797 + 732 1529 X standardized score (test statistic): X z= BPS - 5th Ed. 718 593 − 797 732 = 5.07 1311 ⎛ 1311 ⎞⎛ 1 1 ⎞ + ⎜1 − ⎟⎜ ⎟ 1529 ⎝ 1529 ⎠⎝ 797 732 ⎠ Chapter 20 19 Case Study: Summer Jobs 1. Hypotheses: 2. Test Statistic: z= 3. 4. H0: p1 = p2 Ha: p1 ≠ p2 718 593 − 797 732 = 5.07 1311 ⎛ 1311 ⎞⎛ 1 1 ⎞ 1 + − ⎟ ⎜ ⎟⎜ 1529 ⎝ 1529 ⎠⎝ 797 732 ⎠ P-value: P-value = 2P(Z > 5.07) = 0.000000396 (using a computer) P-value = 2P(Z > 5.07) < 2(1 – 0.9998) = 0.0004 (Table A) [since 5.07 > 3.49 (the largest z-value in the table)] Conclusion: Since the P-value is smaller than α = 0.001, there is very strong evidence that the proportion of male students who had summer jobs differs from that of female students. BPS - 5th Ed. Chapter 20 20