252soln2 (3/17/00) PROBLEM: Downing and Clark, Chapter 17, Computational Problem 15. Twenty people rank two political candidates(A, B) on a scale of 1-10. Test the null hypothesis that people have no preference between the candidates. Data is shown in columns A and B below. .10 Solution: Because this is preference data, we cannot assume that it has the normal distribution. Because it H : 2 is paired, use the Wilcoxon Signed Rank Test. 0 1 or the null hypothesis is simply 'similar H 1 : 1 2 distributions.' Person A x1 B x 2 d x1 x 2 d x1 x2 Rank r Corrected Rank r * 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2 5 3 7 4 8 9 8 7 5 6 7 8 8 9 9 8 4 10 8 8 6 5 8 5 4 8 9 8 6 5 4 5 6 6 7 9 6 7 5 -6 -1 -2 -1 -1 4 1 -1 -1 -1 1 3 3 2 3 2 -1 -2 3 3 6 1 2 1 1 4 1 1 1 1 1 3 3 2 3 2 1 2 3 3 20 1 10 2 3 19 4 5 6 7 8 14 15 11 16 12 9 13 17 18 20511.55519+ 5+ 5555+ 16+ 16+ 11.5+ 16+ 11.5+ 511.516+ 16+ To explain the calculation of corrected ranks we need the table below. Because of the presence of numbers of equal magnitude, the number in ' Corrected Rank r * ' is the average of the numbers in ' Rank r .' Rank r Corrected Rank r * d x1 x 2 1 1 2 3 4 5 6 7 8 9 5 2 10 11 12 13 11.5 3 14 15 16 17 18 16 4 19 19 6 20 20 If we sum the corrected ranks we get T 132 for those with a + sign and T 78 for those with a sign. The smaller of these is designated TL 78. (Check: T T 132 78 210 . This should be equal to the sum of the first 20 numbers, which is 2021 2 210 .) If we use Table 7 " Critical Values of TL in the Wilcoxon Signed Rank Sum Test …..," we use the .05 column for a 2-sided test with .10 . For n 20 the critical value is 60. Since 78 is above 60, do not reject H 0 . For values of n above 15, TL , the smaller of T and T , has the normal distribution and may be used here with T 1 4 nn 1 1 4 20 21 105 and variance T2 16 2n 1T 16 41105 717.5. If the significance level is 10% and the test is two-sided, we 1 252soln2 (3/17/00) reject z our TL T T null hypothesis 78 105 717 .5 if z TL T T does not lie between z 2 z.05 1.645. 1.007 . Since z this is between 1.645 , we do not reject H 0 . PROBLEM: Downing and Clark, Chapter 17, Computational Problem 9. Defense budgets are given for two countries over two decades. Check to see that they have the same distribution. Data is given in columns x1 and x 2 below. .10 Note - It is not totally clear to me that the Wilcoxon-Mann-Whitney test is appropriate in this case, because the data in each row may come from a single year a paired data may be more appropriate, and it is not clear in general why a nonparametric procedure is appropriate. For this reason, Computational problem 11 may be a better example of when to use this method. Solution: Because the data are assumed to be independent random samples and thus not paired use the H 0 : 1 2 Wilcoxon-Mann-Whitney Rank Sum Test. or the null hypothesis is simply 'similar H 1 : 1 2 distributions.' n1 10 and n2 10 . In the table below, two pairs of ranking columns are given, first r1 and r2 , then r1 * and r2 * . r1 and r2 represent bottom to top ranking, while r1 * and r2 * represent bottom to top ranking. Both are equally valid because both samples are of equal size, so the 'rank from the extreme of the smaller sample' rule doesn't hold . r2 * x1 r1 r1 * x2 r2 10 3 18 9 2 19 12 4 17 8 1 20 15 7 14 17 9 12 16 8 13 14 6 15 18 10 11 13 5 16 22 12 9 19 11 10 26 15 6 24 13 8 28 17 4 25 14 7 30 19 2 27 16 5 29 18 3 31 20 1 113 97 97 113 So the sums of the ranks are SR1 113, SR2 97 or SR1 * 97, SR2 * 113 . Check - Note that these two rank sums must add to the sum of the first n1 n2 10 10 20 n numbers, and that this is nn 1 20 21 210 , and that SR1 SR2 97 113 210 . 2 2 The smaller of SR1 and SR2 is called W . This can be compared against the critical values for TL and TU in Table 6b. For n1 10 , n2 10 and a 2-tailed test with .10 , TL 83 and TU 127 . Since W 97 (no matter whether we ranked from the top or from the bottom), it is between these values and we cannot reject the null hypothesis. For values of n1 and n 2 that are too large for the tables, W has the normal distribution with mean W 1 2 n1 n1 n2 1 1 2 1010 10 1 105 and variance W2 16 n2 W 16 10105 175. W W 97 105 z 0.605 . If we wish to take a p-value approach W 175 p value 2Pz 0.605 .5 .2274 .2726 . Since our p-value is larger than the significance level, we cannot reject the null hypothesis. Though the text uses this method in this problem, strictly speaking, it should be limited to cases where n2 20, so that we are better off using the table. 2 252soln2 (3/17/00) 9,14,16,16,18,19,22,23,25,26 , PROBLEM D.4a a)If our sample consists of the numbers test the H : 15 hypotheses 0 by computing x for each value of x and using the magnitude and sign of the H 1 : 15 results to rank them and perform a Wilcoxon signed rank test. .05 H : 2 b) For the following data, test the hypotheses 0 1 on the following paired samples H 1 : 1 2 x1 x2 09 14 16 16 18 19 22 23 25 78 using a Wilcoxon signed rank test. .05 . 14 10 08 14 13 16 12 40 13 24 Solution: a) d x 0 x 9 14 16 16 18 19 22 23 25 26 d x 6 1 1 1 3 4 7 8 10 11 -6 -1 1 1 3 4 7 8 10 11 Rank r 6 1 2 3 4 5 7 8 9 10 Corrected Rank r * 622+ 2+ 4+ 5+ 7+ 8+ 9+ 10+ H 0 : 15 so d x 0 x 15 . If we sum the corrected ranks we get T 47 for those with a + H 1 : 15 sign and T 8 for those with a sign. The smaller of these is designated TL 8. (Check: T T 47 8 55 . This should be equal to the sum of the first 10 numbers, which is 1011 55 .) - 2 If we use Table 7 " Critical Values of TL in the Wilcoxon Signed Rank Sum Test …..," we use the .025 column for a 2-sided test with .05 . For n 10 the critical value is 8. Since this is equal to our value of TL , reject H 0 . b) Observation 1 2 3 4 5 6 7 8 9 10 x1 9 14 16 16 18 19 22 23 25 78 x2 14 10 8 14 13 16 12 40 13 24 d x1 x 2 -5 4 8 2 5 3 10 -17 12 54 d x1 x 2 5 4 8 2 5 3 10 17 12 54 Rank r 4 3 6 1 5 2 7 9 8 10 Corrected Rank r * 4.53+ 6+ 1+ 4.5+ 2+ 7+ 98+ 10+ 3 252soln2 (3/17/00) H 0 : 1 2 . If we sum the corrected ranks we get T 41 .5 for those with a + sign and T 13.5 for H 1 : 1 2 those with a - sign. The smaller of these is designated TL 13 .5. (Check: T T 41.5 13.5 55 . This should be equal to the sum of the first 10 numbers, which is 1011 55 .) If we use Table 7 " 2 Critical Values of TL in the Wilcoxon Signed Rank Sum Test …..," we use the .025 column for a 2-sided test with .05 . For n 10 the critical value is 8. Since our value of TL is above the critical value, do not reject H 0 . PROBLEM D.5 We have the following data for returns on two stocks: Stock A 7, 8, -5, 9, 11 nA = 5 Stock B 6, 7, 0, 4, 9, 15 nB = 6 a. Find a 95% confidence interval for A2 B2 b. Test the following at a 95% level: H 0 : A2 B2 H 1 : A2 B2 Solution: The formulas for this problem are given in “Confidence limits and Hypothesis Testing for Variances” in the Syllabus Supplement and summarized on the formula pages. a) From the data above we can compute s A2 40.00 and s B2 25.36 . The formula given is s 22 s12 22 s 22 ( n1 1, n2 1) F . If we let Stock A be x 2 , and Stock B be x1 , then we can state F n2 1,n1 1 12 s12 2 1 2 that DF1 n B 1 5 , DF1 n A 1 4 and confidence interval formula ( 5, 4 ) ( 4,5) F.025 9.36 and F.025 7.39 , 0.213 A2 B2 get 0.462 b) get s12 s A2 s B2 1.577 so s A2 s B2 40 .00 1.577 . Substitute these values in the 25 .36 A2 s A2 (5, 4) F.025 . F4,5 B2 s B2 1 From the F table 2 2 1 A2 1.577 9.36 7.39 B , which becomes 14 .76 . If we wish an interval for the standard deviations, we can take the square roots to A B 3.842 . We are testing F DF1 , DF2 and s 22 H 0 : A2 B2 H1 : A2 B2 . According to the syllabus supplement, test s12 where DF1 n1 1 , s22 DF2 n 2 1 and s12 is the larger of the two variances according to the alternate hypothesis. Since this is a 1-sided test with .05 , s A2 s B2 40 .00 1.577 is 25 .36 4,5 5.19 . Since the ratio is smaller than the F, we accept H . thus compared to F.05 0 4 252soln2 (3/17/00) PROBLEM D.6 In a study of sleep gotten with a sleeping pill and with a placebo the results were (Keller, Warren, Bartel, 2nd ed. p. 354) Pill x1 Placebo x2 7.3 8.5 6.4 9.0 6.9 6.8 7.9 6.0 8.4 6.5 x1 7.620 difference d .5 .6 .4 .6 .4 x 2 7.120 d 0.500 s12 1.197 s 22 0.997 s d2 0.010 a. Assume that these are independent samples from a normal distribution and that 12 22 (Test if 12 22 ). b. Assume that these are independent samples and that 12 22 . Optional c. Assume these are paired samples. In each case do (i) a 99% confidence interval for 1-2 , (ii) test if 1=2 . (iii) In case a) test if 12 22 . d. Redo part a assuming that the parent population is not normal. e. Redo part c(ii) assuming that the parent distribution is not normal. Solution: Assume .01 . a) Assume that these are independent samples from a normal distribution and that 12 22 (Test if 12 22 ). From the Syllabus supplement: Interval for Confidence Interval Difference d t 2 sd Between Two 1 1 Means ( sd s p Unknown, n1 n2 Variances Assumed equal) DF n1 n2 2 (i) Hypotheses Test Ratio H 0 : 0 * H 1: 0 1 2 t sˆ 2p Critical Value d 0 sd d cv 0 t 2 s d n1 1s12 n2 1s22 n1 n2 2 Confidence interval: In the case of equal variances we used a pooled variance, n 1s12 n2 1s 22 41.197 40.947 sˆ 2p 1 1.097 . This is used to compute n1 n 2 1 8 s d sˆ p 1 1 n1 n 2 d x1 x 2 and 1.097 1 1 5 1 2 8 t tn1 n2 1 t .005 3.355 , 2 0.439 0.662 . 5 , the becomes equation Since we can d t sd 2 1 2 x1 x 2 t 2 s d say , that where or 1 2 0.500 3.355 0.662 0.500 2.221 5 252soln2 (3/17/00) (ii) H 0 : 0 H1 : 0 or H 0 : 1 2 H1 : 1 2 . If we use a test ratio, d 0 x1 x 2 1 2 0.500 0 0.755 . sd sd 0.662 8 t .005 3.355 , we accept H 0 . If we use t Since a this critical is between value instead, d cv 0 t 2 sd 0 3.355 0.062 2.221 . Since d 0.500 is between these critical values, we accept H 0 . (iii) H 0 : 12 22 We are testing test F DF1 , DF2 H1 : 12 22 . According to the syllabus supplement, s12 s22 DF2 , DF1 2 and F 2 , where DF1 n1 1 and DF2 n 2 1 . s2 s1 s 22 0.997 1.197 or 0.833 , so we 1 . 201 s12 1.197 s 22 0.997 4,4 , But it's not accept H 0 . (Actually we should be checking against F4,4 F.005 4,4 9.60 and is larger than F.01 s12 2 4, 4 must be larger than F 4, 4 . available on the table. A check of the table shows that F.005 .01 4, 4 , it also must be less than F 4, 4 . So if 1.201 is less than F.005 .01 b) Assume that these are independent samples and that 12 22 . From the Syllabus supplement: Difference H 0 : 0 * d t 2 sd Between Two H 1: 0 Means( s12 s22 1 2 sd Unknown, n1 n2 Variances 2 s12 s22 Assumed n n2 1 Unequal) DF t d cv 0 t 2 sd d 0 sd s12 2 s 22 n1 n1 1 (i) 2 n2 n2 1 (Optional) Confidence interval: In the case of unequal variances we use the Satterthwaite method. s12 1.197 s 22 0.997 s2 s2 0.2394 , 0.1994 , so 1 2 0.2394 0.1994 0.4388 . n1 5 n2 5 n1 n 2 If we use this in the degrees of freedom formula, we find DF s12 s 22 n1 n 2 2 2 2 s12 s 22 n1 n2 n1 1 n2 1 0.4388 2 0.2394 2 0.1994 2 4 7.9341 . We round this down to get 4 7 degrees of freedom. This is used with s d s12 s 22 0.4388 0.6624 . Since we n1 n 2 can say that d x1 x 2 and 1 2 , the equation d t sd 2 , where 6 252soln2 (3/17/00) 7 t t .005 3.499 , becomes 1 2 x1 x 2 t s d or 1 2 0.500 3.499 0.662 2 0.500 2.318 (ii) H 0 : 0 H1 : 0 or H 0 : 1 2 H1 : 1 2 . If we use a test ratio, d 0 x1 x 2 1 2 0.500 0 0.755 . sd sd 0.6624 7 t .005 3.499 , we accept H 0 . If we use t Since a this critical is between value instead, d cv 0 t 2 sd 0 3.499 0.0624 2.318 . Since d 0.500 is between these critical values, we accept H 0 . c) Assume these are paired samples. (i) Confidence interval: In the case of paired data, we act as if we have only n n1 n 2 4 4.604 and s d pairs. DF n 1 4 . t t .005 d x1 x 2 and 1 2 , the equation sd 0.010 n .002 0.447 . 5 d t sd 2 , becomes 1 2 x1 x 2 t 2 s d or 1 2 0.500 4.604 0.0447 0.500 0.206 (ii) H 0 : 0 H1 : 0 or H 0 : 1 2 H1 : 1 2 . If we use a test ratio, d 0 x1 x 2 1 2 0.500 0 11 .18 . Since this is not between sd sd 0.0447 t 4 4.604 , we reject H . If we use a critical value instead, t .005 0 d cv 0 t 2 sd 0 4.604 0.0447 0.206 . Since d 0.500 is not between these critical values, we reject H 0 . d) Redo part a(ii) assuming that the parent population is not normal. Since the parent population is not x1 r1 normal and the data represents two independent 7.3 5 samples we do a Wilcoxon rank sum test. To do 8.5 2 this we rank the ten numbers from 1 to ten 6.4 9 starting at the extreme end of the smallest 9.0 1 sample. Since the samples are of the same size 6.9 6 we arbitrarily pick x1 as the smaller sample and 23 note that 9 is the largest number in both samples so that is where we start our ranking. Since we are working with non normal items, our hypotheses are stated as H0 : 1 2 H1 : 1 2 x2 r1 6.8 7 7.9 4 6.0 10 8.4 3 6.5 8 32 d .5 .6 .4 .6 .4 From the above n1 n2 5 , and the sums of the ranks are SR1 23 and SR2 32 . W is the smaller of the two rank sums and is 23. To check our rank sums note that n1 n2 n 10 and that if the rank sums nn 1 10 11 55 , so the ranking seems correct. If we go . In this case 23 32 2 2 to Table 5 in the syllabus supplement, we find that the p-value for W 23 is .210. Since this is a 2-sided test it should be doubled to .410. In any case, it is above .01 , so accept H 0 . For a 5% test Table 6 could be used. are correct, SR1 SR2 7 252soln2 (3/17/00) e) Redo part c(ii) assuming that the parent distribution is not normal. Since the parent population is not normal and the data represents paired samples we would prefer to do a Wilcoxon signed rank test of the hypotheses H0 : 1 2 H1 : 1 2 . To do this we take the values of d x1 x 2 and replace them with their absolute values d . We rank the n values from 1 to n . To compute corrected ranks we add + or - according to the sign in d and replace all ties with average ranks. x1 x2 d 7.3 8.5 6.4 9. 6.9 6.8 7.9 6.0 8.4 6.5 .5 .6 .4 .6 .4 d .5 .6 .4 .6 .4 rank corrected rank 3 4 2 5 1 +3.0 +4.5 +1.5 +4.5 +1.5 For example, ranks 4 and 5 are both replaced with 4.5, their average, because they correspond to identical values (.6) of d . We next compute T and T , the sums of the positive and negative ranks. In this case T 3.0 4.5 1.5 4.5 1.5 15 , while T 0. Our check on the ranking is that the sum of the numbers nn 1 56 nn 1 15 , which, as it should be, is the sum of is 1 to n is , In this case, since n 5 , 2 2 2 T and T . We call the smaller of T and T , in this case 0, TL , and look it up on Table 7 in the syllabus supplement. Unfortunately for n 5 , there are no appropriate value, so we cannot reject H 0 . A second choice test here would be a sign test. We use a binomial table to find out the probability of getting 5 (or more) positive differences in 5 tries, assuming that the probability is .5. From the binomial table this probability is .0313, but to make this into a p-value for a 2-sided test, we must double it to .0626. Since .01 is less than the p-value, we must accept H 0 , though, if we were working with a higher significance level we could reject it. 8