1 252meanx2 11/02/05 (Open this document in 'Outline' view!) D. COMPARISON OF TWO SAMPLES (CTD.) 5. Rank Tests. Especially in the case where samples are small and the underlying distributions are not normal, it is not appropriate to compare means. a. The Wilcoxon-Mann-Whitney Test for Two Independent Samples. If samples are independent, This test is appropriate to test whether the two samples come from the same distribution. If the distributions are similar, it is often called a test of equality of medians. Example: Let us assume that we have two very small samples from New York n 2 6 and Pennsylvania n1 4 and we wish to compare their medians. Let us call the smaller sample (Pennsylvania) ‘sample 1’ and the larger sample ‘sample 2’, so that n1 n 2 . If we use for the median, our hypotheses are H 0 : 1 2 and .05 . H 1 : 1 2 Assume that our data is as below: Pennsylvania 11000 16000 80000 85000 New York 17000 30000 50000 70000 80000 90000 Our first step is to rank the numbers from 1 to n n1 n 2 4 6 10. note that the 7th and 8th numbers are tied, so that both are numbered 7.5. These can be ordered from the largest to the smallest or from the smallest to the largest. To decide which to do, look at the smaller sample. If the smallest number is in the smaller sample, order from smallest to largest, if the largest number is in the smallest sample, order from the largest to the smallest. Since 11000 is the smallest number, let that be 1. Pennsylvania r1 New York x1 x2 r2 11000 1 17000 3 16000 2 30000 4 80000 7.5 60000 5 85000 9 . 70000 6 19.5 80000 7.5 90000 10 . 35.5 Now compute the sums of the ranks. SR1 19 .5, SR2 35 .5 . As a check, note that these two rank sums must add to the sum of the first n numbers, and that this is SR1 SR2 19.5 35.5 55 . nn 1 10 11 55 , and that 2 2 2 The smaller of SR1 and SR2 is called W and is compared with Table 5 or 6. To use Table 5, first find the part for n 2 6 , and then the column for n1 4 . Then try to locate W 19.5 in that column. In this case, since for W 19 the p-value is .3048, and for W 20 the p-value is .3810, we can say that .3048 pvalue .3810 . Since both are above the significance level, we cannot reject the null hypothesis. This can also be compared against the critical values for TL and TU in table 6b; these are 13 and 31. Since W 19.5 it is between these values and we cannot reject the null hypothesis. For values of n1 and n 2 that are too large for the tables, W has the normal distribution with mean W 1 2 n1 n1 n2 1 and variance W2 16 n2 W . Though the example above is too small for this treatment, for continuity its data will be used here. If the significance level is 5% and the test is one-sided, W W we reject our null hypothesis if z lies below z .05 1.645 . In this case then W W z 1 2 n1 n1 n2 1 2 44 6 1 22 and W2 W W W 1 19 .5 22 22 1 6 n 2 W 1 6 622 22 so that 0.53 . Since this is not below –1.645, we cannot reject H 0 . b. Wilcoxon Signed Rank Test for Paired Samples. This is a test for equality of medians when the data is paired. It can also be used for the median of a single sample. The Sign Test for paired data is a simpler test to use in this situation, but it is less powerful. As in many tests for measures of central tendency with paired data, the original numbers are discarded, and the differences between the pairs are used. If there are n pairs, these are ranked according to absolute value from 1 to n , either top to bottom or bottom to top. After replacing tied absolute values with their average rank, each rank is marked with a + or – sign and two rank sums are taken, T and T . The smaller of these is compared with Table 7. Example: We wish to compare sales of a product before and after an advertisement appeared in a nationally televised football game. Sales in a sample of eight stores before the game are x1 and sales after are x 2 . Define d x 2 x1 as the improvement in sales. Though the appropriate test here would be one-sided, a two sided test is demonstrated here instead. H 0 : 1 2 n 8 and .05 . The data are below: The column d is the absolute value of d , the H 1 : 1 2 column r ranks absolute values, and the column r * is the ranks corrected for ties and marked with the signs on the differences. 3 x1 x2 d x 2 x1 d r r* 7600 8600 +1000 1000 8 8+ 8700 8900 +200 200 2 2.5+ 9600 9400 -200 200 3 2.58400 8700 +300 300 4 4+ 7600 8100 +500 500 6 6+ 6900 7500 +600 600 7 7+ 7300 7700 +400 400 5 5+ 8200 8100 -100 100 1 1If we add together the numbers in r * with a + sign we get . T 32.5 . If we do the same for numbers with a – sign, we get T 3.5. To check this, note that these two numbers must sum to the sum of the first nn 1 89 n numbers, and that this is 36 , and that T T 32.5 3.5 36 . 2 2 We check 3.5, the smaller of the two rank sums against the numbers in table 7. For a two-sided 5% test, we use the .025 column. For n 8 , the critical value is 4, and we reject the null hypothesis only if our test statistic is below this critical value. Since our test statistic is 3.5, we reject the null hypothesis. For values of n that are too large for the table, TL , the smaller of T and T , has the normal distribution with mean T 1 4 nn 1 and variance T2 16 2n 1T . Though the example above is too small for this treatment, for continuity its data will be used here. If the significance level is 5% and the test is twoT T sided, we reject our null hypothesis if z L does not lie between z z.025 1.960 . In this T case then T z TL T T 1 4n n 1 1 4 88 1 18 and T2 16 2n 1T 3.5 18 51 2 1 6 16 118 51 so that 2.03 . Since this is not between 1.960 , we reject H 0 . 6. Proportions. 6a. Independent Samples: If p1 is the proportion of successes in the first population, and p 2 is the proportion of successes in the second population, we define p p1 p 2 . Then our hypotheses will be H 0 : p1 p 2 H 1 : p1 p 2 Let p1 H 0 : p = p 0 or more generally H 1 : p p 0 x1 x , p2 2 and p p1 p2 where x1 is the number of successes in the first sample, x2 is the n2 n1 number of successes in the second sample and n1 and n 2 are the sample sizes. The usual three approaches to testing the hypotheses can be used. 4 a. Confidence Interval: p p z s p 2 s p or p1 p 2 p1 p 2 z s p , where 2 p1 q1 p 2 q 2 . Compare this interval with p0 . n1 n2 b. Test Ratio: z p p p 0 p p1 p 2 p10 p 20 p where p1 q1 p 2 q 2 although s p may have to be used if p1 and p 2 are unknown. n1 n2 Also note that if the null hypothesis is p1 p2 or p0 0 , we use 1 n p n 2 p 2 x1 x 2 1 , where p 0 1 1 and x1 and x2 are the n1 n 2 n1 n 2 n n 2 1 number of successes in sample 1 and sample 2, respectively. c: Critical Value: pCV p0 z p or p1 p 2 CV p10 p 20 z p . 2 2 p p 0 q 0 Test this against p1 p 2 . For calculation of p , see Test Ratio above. Example: An insurance company operating in its home state (region 1) has 18 claims on 1000 policies, a ratio of .018. In another state (region 2) it has 12 claims on 400 policies, a ratio of .030. Are these two ratios significantly different at the .01 significance level? n H : p p 2 1000 x1 18 p1 .018 q1 1 .018 .982 Our facts are 1 . We are testing 0 1 or n 2 400 x 2 12 p 2 .030 q 2 1 .030 .970 H 1 : p1 p 2 H 0 : p = 0 . For the critical Value or Test Ratio method, we need H 1 : p 0 n p n 2 p 2 1000 .018 400 .030 p0 1 1 .0214 or, more easily, n1 n 2 1000 400 p0 x1 x 2 18 12 .0214 . This implies that q 0 1 p 0 1 .0214 .9786 and that n1 n 2 1000 400 1 1 n n 2 1 p p 0 q 0 .0214 .9786 1 1 .000073297 .00856 . 1000 400 p p1 p2 .018 .030 .012 . For a two-sided test, we will need z 2 z.005 2.576 . Critical Value: pCV p0 z p 0 2.576.00856 .0221. Since p .012 falls between – 2 0.0221 and 0.0221, do not reject the null hypothesis. p p 0 .012 0 1.402 . Since this falls between -2.576 and 2.576 do not reject Test Ratio: z p .00856 the null hypothesis. Confidence Interval: s p p1 q1 p 2 q 2 n1 n2 .018 .982 .030 .970 1000 400 .00009043 .00951 p p z 2 s p .012 2.576.00951 .012 .024 or -.036 to .012. Since p 0 0 falls between .036 and .012, do not reject the null hypothesis. 5 6b Paired Samples: In Method D6a, we assume that we are comparing proportions from two independent samples. In the McNemar Test we compare two proportions taken from the same sample, which is equivalent to paired samples. Assume that two different questions are asked of the same group with the question 2 question 1 yes no following responses. So, for example x 21 is the number of people who answered yes x11 x12 x no 21 x 22 no to question 1 and yes to question 2. x11 x12 x 21 x 22 n , p1 x11 x12 x x 21 and p 2 11 . If we n n H : p p 2 wish to test 0 1 ,where p1 is the proportion saying ‘yes’ to the first question and p 2 is the H 1 : p1 p 2 x x 21 proportion saying ‘yes’ to the second question, let z 12 (The test is valid only if x12 x 21 x12 x 21 10 .) Example: A famous example of this concerns a debate between candidates, question 1 is whether the respondent supports candidate 1 before the debate and question 2 is whether the respondent supports question 2 question 1 yes no candidate 1 after the debate. The data is and the question is whether the debate has yes 27 7 13 28 no changed the fraction supporting candidate 1. Write this out as a hypothesis test and do the test. H 0 : p1 p 2 H : p p 2 0 Solution: or 0 1 This is a two-sided test, so if we use a 5% significance H : p p 2 1 1 H 1 : p1 p 2 0 x x 21 level, our rejection regions are below z .025 1.96 and above z.025 1.96 . z 12 x12 x 21 7 13 6 36 1.8 1.34 , and we cannot reject the null hypothesis. If we use a p-value, 20 7 13 20 2Pz 1.34 2.5 .4099 0.0901 , so we could reject the null hypothesis at a 10% significance level, H 0 : p1 p 2 but not a 5% level. If you (wrongly, but understandably though that the hypotheses were or H 1 : p1 p 2 H 0 : p1 p 2 0 , the 5% rejection region would be below z .05 1.645 and we still could not reject the H 1 : p1 p 2 0 null hypothesis. Note: This is a version of the Chi-Square Test – Recall that 2 O E 2 E . If we take x11 and x 22 as question 1 given, and assume that the null hypothesis is correct, then the table already given, yes no question 2 yes no x11 x12 x 21 x 22 is our O , and the numbers in the x12 and the x 21 slots must be equal for there to be no change in 6 question 1 preferences, so that our E is yes no question 2 yes no x12 x 21 . This means that two of the four terms x11 2 x x 21 12 x 22 2 2 in 2 O E 2 E x12 x 21 2 x12 x 21 x x 21 x x 21 x12 12 x 21 12 2 2 2 are zero and the remaining terms are x12 x 21 x12 x 21 2 2 2 . But 2 has only one degree of freedom, and, since 2 is defined as a sum of z 2 , we can take a square root and say z x12 x 21 x12 x 21 . Switch to document 252meanx4 here for examples for point 7. 7. Variances. s12 s22 Test the ratios 2 and 2 against values of F . 2 s2 s1 H 0 : 12 22 If we want to test H 1 : 12 22 against H1 : where DF1 n1 1 and DF2 n 2 1 , we effectively test H 0 : 12 1 22 s12 12 by comparing against FDF1 ,DF2 . If the ratio of sample variances is larger than 1 22 s 22 FDF1 ,DF2 , reject H 0 . H 0 : 12 22 If we want to do the opposite test H 1 : 12 22 comparing s 22 s12 , we test H 0 : 22 22 against H : 1 by 1 1 12 12 H 0 : 12 22 against FDF2 ,DF1 . For the 2-sided test H 1 : 12 22 F 2 . A 2-sided confidence interval is s 22 s12 F DF1 , DF2 2 22 12 s 22 1 , do both tests above , but use s12 FDF2 , DF1 . 2 For examples, see the syllabus supplement article, “Confidence Intervals and Hypothesis Testing for Variances.” 7 8. Summary It may help to use the following table. Paired Samples Location - Normal distribution. Method D4 Compare means. Independent Samples Methods D1- D3 Location - Distribution not Normal. Compare medians. Method D5b Method D5a Proportions Method D6b Method D6a Variability - Normal distribution. Compare variances. © 2005 Roger Even Bove Method D7