Chapter 13: Comparing Two Population Parameters Section 13.1: Comparing Two Means When we learned about designing scientific studies we had a basic format for testing two competing claims. For example, if a pharmaceutical company comes up with a new medication for lowering cholesterol and we want to test it, we would compare it to the current medication for lowering cholesterol using the following basic design: Group receiving old drug Random Allocation SRS of Volunteers with high cholesterol Compare Cholesterol Levels Random Allocation Group receiving new drug Up to this point we know how to randomly allocate, we know how to block, if necessary, but we don’t know how to compare. Here is how one would compare two samples using a t-test: Example: An educational consulting company claims that it developed a new more effective method of teaching AP Statistics. In order to test the claim, the company chooses a random sample of 40 students who are interested in taking AP Statistics. It then randomly assigns half those students to a class using their new method, and half the students to a class using the traditional method. At the end of the course, after the students take the AP Exam in Statistics, the scores are compared. a) Conduct a hypothesis test (alpha = 0.05 level) to see if in fact the new method produces better scores than the old method. b) Construct a 95% confidence interval for the mean difference of scores. Lets say that: x1 The average score of students in the new methods class = 3.7 x2 The average score of students in the traditional methods class = 3.5 s1 = standard deviation of the score form the first sample = 0.7 s2 = standard deviation of the score from the second sample = 0.6 Part a: Hypothesis testing Step 1: State the hypothesis H o : 1 2 0 H a : 1 2 0 Where 1 Average score of students using the new method 2 Average score of students using the traditional method Step 2: Assumptions 1) Sample is an SRS. Given 2) Since the sum of our sample sizes is 40 we don’t have to worry about how normal our population distribution is. (If the sum of the sample sizes were less than 40 we would want to check to make sure that each sample came from an approximately normal population with no outliers) 3) The samples need to be independent (see below) Step 3: Calculate the test statistic and p-value Keep in mind that our test statistic now involves the distribution of the difference of the two individual 2 2 2 2 distributions. This new distribution has a mean 1 2 , and a variance of 1 2 = 1 2 . n n n1 n2 1 2 Its standard deviation is therefore So the 2-sample z-statistic will be: 12 n1 22 n2 . ( x1 x2 ) ( 1 2 ) 12 n1 22 n2 Since we don’t know what is, we will replace it with the sample standard deviation, s, and use the tstatistic instead of the z-statistic: t ( x1 x2 ) ( 1 2 ) s12 s 22 n1 n2 In our case t (3.7 3.5) 0 0 .7 2 20 0.6 2 = 0.9701 20 The degrees of freedom with a two-sample t-test is the smaller of (n1 - 1) and (n2 - 1) (note: the calculator and computer software use a more complicated formula for calculating the degrees of freedom. Our way, though less exact, is more conservative) So in our case we have 19 degrees of freedom. P(t>0.9701) with 19 d.f is between 0.15 and 0.20. Step 4: Conclusion Since this p-value is not significant to the .05 level, we will not reject the null hypothesis. This would mean that we do not have enough evidence to say that the new method is any better than the traditional method for raising AP scores. Part b) 95% Confidence interval: Our confidence interval will take the form of: ( x1 x 2 ) t s12 s 22 n1 n 2 The t value for a 95% confidence interval with 19 degrees of freedom is 2.093 Therefore the confidence interval for our situation is: (3.7 – 3.5) 2.093 0.7 2 0.6 2 = 0.2 .4315 20 20 (-0.2315, .6315) Section 13.2: Comparing Two Proportions Just like we can use hypothesis testing to compare two population means and confidence intervals to estimate the difference between two population means, we can do the same with the difference of two population proportions. Example: Thinking about the upcoming prom, Andrew Negri is pondering whether he should expand his date opportunities and ask some girls out from Greenwich HS hoping that his chances would be better there. He conjectures that a higher proportion of GHS seniors attended the prom last year than DHS seniors. He takes an SRS from each school and gets the following data: Population Sample size DHS Seniors GHS Seniors 25 45 # of seniors who attended the prom last year 15 40 I) Construct a 95% confidence interval of the difference in proportion of GHS seniors who went to the prom last year and DHS seniors who went to the prom last year. Assumptions: 1) Samples are SRS from the designated populations (given) 2) The population is at least 10 times as large as the samples. - For GHS a sample size of 45 is ok, but for DHS a sample size of 25 is somewhat problematic because there aren’t at least 250 seniors. So we proceed with caution 3) n1 pˆ 1 5 n1 (1 pˆ 1 ) 5 n2 pˆ 2 5 n2 (1 pˆ 2 ) 5 check 25(.6) 15 5 25(1 .6) 10 5 45(0.89) 40 5 45(1 0.89) 5 5 Interval construction: ( pˆ 2 pˆ 1 ) z SE SE pˆ 1 (1 pˆ 1 ) pˆ 2 (1 pˆ 2 ) n1 n2 Where: p1 = proportion of DHS seniors who attended the prom last year P2 = proportion of GHS seniors who attended the prom last year (0.89 0.6) 1.96 .6(1 .6) 0.89(1 0.89) 25 45 0.29 0.212 (0.078, 0.502) II) Do a hypothesis test (alpha = 0.05) to see if Andrew is correct Step 1: State the hypothesis H o : p1 p 2 or H a : p1 p 2 H o : p1 p 2 0 H a : p1 p 2 0 Where: p1 = proportion of DHS seniors who attended the prom last year P2 = proportion of GHS seniors who attended the prom last year Step 2: Assumptions 1) Samples are SRS from the designated populations (given) 2) The population is at least 10 times as large as the samples. - For GHS a sample size of 45 is ok, but for DHS a sample size of 25 is somewhat problematic because there aren’t 250 seniors. So we proceed with caution 3) n1 pˆ 1 5 n1 (1 pˆ 1 ) 5 n2 pˆ 2 5 n2 (1 pˆ 2 ) 5 check 25(.6) 15 5 25(1 .6) 10 5 45(0.89) 40 5 45(1 0.89) 5 5 Step 3: Calculate test statistic and p-value z pˆ 1 pˆ 2 1 1 pˆ (1 pˆ ) n1 n2 where: p̂1 Proportion of DHS seniors from our sample who attended prom last year p̂2 Proportion of GHS seniors from our sample who attended prom last year p̂ Pooled proportion of seniors who attended the prom from both samples combined p̂ count of successes in both samples combined 55 0.79 count of oservation s in both samples combined 70 The reason we used a pooled proportion for the standard deviation of the difference of proportions is because in the null hypothesis we assumed the two population proportions are equal. This essentially means that there is one population (of DHS and GHS seniors) from which a certain proportion attended the prom the year before. That proportion is p̂ . z pˆ 1 pˆ 2 1 1 pˆ (1 pˆ ) n1 n2 0.6 0.89 1 1 (0.79) (1 0.79) 25 45 0.29 2.82 .1025 P(Z<-2.82) = 0.0024 Step 4: Conclusion Since our p-value is less than our alpha level we can reject the null hypothesis and conclude that there is enough evidence to suggest that the proportion of seniors from GHS who went to the prom last year is greater than the proportion of seniors from DHS who went to the prom last year.