252y0821 3/31/08 ECO252 QBA2 SECOND EXAM March 28 2008 Solution-33 pages Name KEY Class Hour___________________ Show your work! Make Diagrams! Exam is normed on 50 points. Answers without reasons are not usually acceptable. I. (8 points) Do all the following. Make diagrams! x ~ N 11, 13 - If you are not using the supplement table, make sure that I know it. 54 11 0 11 z P 0.85 z 3.31 P0.85 z 0 P0 z 3.31 1. P0 x 54 P 13 13 .3023 .4995 .8018 . Values are underlined on the next page. For z make a diagram. Draw a Normal curve with a mean at 0. Indicate zero by a vertical line! Shade the area between -0.85 and 3.31. Because this is on both sides of zero, we must add together the area between -0.85 and zero and the area between zero and 3.31. If you wish, make a completely separate diagram for x . Draw a Normal curve with a mean at 11. Indicate the mean by a vertical line! Shade the area between zero and 54. This area includes the mean (11) and areas to either side of it so we add together these two areas. 16 11 Pz 2.08 Pz 0 P2.08 z 0 .5 .4812 .0188 2. Px 16 P z 13 For z make a diagram. Draw a Normal curve with a mean at 0. Indicate zero by a vertical line! Shade the area below -2.08. Because this is on one side of zero, we must subtract the area between -2.08 and zero from the entire (larger) area below zero. If you wish, make a completely separate diagram for x . Draw a Normal curve with a mean at 11. Indicate the mean by a vertical line! Shade the area below -16. This area does not include the mean (11) so we subtract the area between -16 and the mean from the larger area below the mean. 41 11 12 11 z P0.08 z 2.31 P0 z 2.31 P0 z 0.08 3. P12 x 41 P 13 13 .4896 .0319 .4577 For z make a diagram. Draw a Normal curve with a mean at 0. Indicate zero by a vertical line! Shade the area between 0.08 and 2.31. Because this is on one side of zero, we subtract the area between zero and 0.08 from the larger area between zero and 2.31. If you wish, make a completely separate diagram for x . Draw a Normal curve with a mean at 11. Indicate the mean by a vertical line! Shade the area between 12 and 41. This area does not include the mean (11) so we subtract the area between the mean and 12 from the larger area between the mean and 41. 4. x.055 (Do not try to use the t table to get this.) For z make a diagram. Draw a Normal curve with a mean at 0. Indicate zero by a vertical line! z .055 is the value of z with 5.5% of the distribution above it. Since 100 – 5.5 = 94.5, it is also the 94.5th percentile. Since 50% of the standardized Normal distribution is below zero, your diagram should show that the probability between z .055 and zero is 94.5% - 50% = 44.5% or P0 z z.055 .4450 . The closest we can come to this on the standardized Normal table is P0 z 1.60 .4452 . So z .055 1.60 . To get from z .055 to x.055 , use the formula x z , which is x . x 11 1.6013 31.80 . If you wish, make a completely separate diagram for x . Draw a Normal curve with a mean at 11. Show that 50% of the distribution is below the mean (11). If 5.5% of the distribution is above x.055 , it must be above the mean and have 44.5% of the distribution between it and the mean. the opposite of z 31 .8 11 Pz 1.60 Pz 0 P0 z 1.60 .5 .4452 .0548 Check: Px 31 .8 P z 13 1 252y0821 3/31/08 TABLE 4 The Standard Normal Distribution. Example: P0 z 1.21 0.3869 z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.0 0.1 0.2 0.3 0.4 0.0000 0.0398 0.0793 0.1179 0.1554 0.0040 0.0438 0.0832 0.1217 0.1591 0.0080 0.0478 0.0871 0.1255 0.1628 0.0120 0.0517 0.0910 0.1293 0.1664 0.0160 0.0557 0.0948 0.1331 0.1700 0.0199 0.0596 0.0987 0.1368 0.1736 0.0239 0.0636 0.1026 0.1406 0.1772 0.0279 0.0675 0.1064 0.1443 0.1808 0.0319 0.0714 0.1103 0.1480 0.1844 0.0359 0.0753 0.1141 0.1517 0.1879 0.5 0.6 0.7 0.8 0.9 0.1915 0.2257 0.2580 0.2881 0.3159 0.1950 0.2291 0.2611 0.2910 0.3186 0.1985 0.2324 0.2642 0.2939 0.3212 0.2019 0.2357 0.2673 0.2967 0.3238 0.2054 0.2389 0.2704 0.2995 0.3264 0.2088 0.2422 0.2734 0.3023 0.3289 0.2123 0.2454 0.2764 0.3051 0.3315 0.2157 0.2486 0.2794 0.3078 0.3340 0.2190 0.2517 0.2823 0.3106 0.3365 0.2224 0.2549 0.2852 0.3133 0.3389 1.0 1.1 1.2 1.3 1.4 0.3413 0.3643 0.3849 0.4032 0.4192 0.3438 0.3665 0.3869 0.4049 0.4207 0.3461 0.3686 0.3888 0.4066 0.4222 0.3485 0.3708 0.3907 0.4082 0.4236 0.3508 0.3729 0.3925 0.4099 0.4251 0.3531 0.3749 0.3944 0.4115 0.4265 0.3554 0.3770 0.3962 0.4131 0.4279 0.3577 0.3790 0.3980 0.4147 0.4292 0.3599 0.3810 0.3997 0.4162 0.4306 0.3621 0.3830 0.4015 0.4177 0.4319 1.5 1.6 1.7 1.8 1.9 0.4332 0.4452 0.4554 0.4641 0.4713 0.4345 0.4463 0.4564 0.4649 0.4719 0.4357 0.4474 0.4573 0.4656 0.4726 0.4370 0.4484 0.4582 0.4664 0.4732 0.4382 0.4495 0.4591 0.4671 0.4738 0.4394 0.4505 0.4599 0.4678 0.4744 0.4406 0.4515 0.4608 0.4686 0.4750 0.4418 0.4525 0.4616 0.4693 0.4756 0.4429 0.4535 0.4625 0.4699 0.4761 0.4441 0.4545 0.4633 0.4706 0.4767 2.0 2.1 2.2 2.3 2.4 0.4772 0.4821 0.4861 0.4893 0.4918 0.4778 0.4826 0.4864 0.4896 0.4920 0.4783 0.4830 0.4868 0.4898 0.4922 0.4788 0.4834 0.4871 0.4901 0.4925 0.4793 0.4838 0.4875 0.4904 0.4927 0.4798 0.4842 0.4878 0.4906 0.4929 0.4803 0.4846 0.4881 0.4909 0.4931 0.4808 0.4850 0.4884 0.4911 0.4932 0.4812 0.4854 0.4887 0.4913 0.4934 0.4817 0.4857 0.4890 0.4916 0.4936 2.5 2.6 2.7 2.8 2.9 0.4938 0.4953 0.4965 0.4974 0.4981 0.4940 0.4955 0.4966 0.4975 0.4982 0.4941 0.4956 0.4967 0.4976 0.4982 0.4943 0.4957 0.4968 0.4977 0.4983 0.4945 0.4959 0.4969 0.4977 0.4984 0.4946 0.4960 0.4970 0.4978 0.4984 0.4948 0.4961 0.4971 0.4979 0.4985 0.4949 0.4962 0.4972 0.4979 0.4985 0.4951 0.4963 0.4973 0.4980 0.4986 0.4952 0.4964 0.4974 0.4981 0.4986 3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990 For values above 3.09, see below If z 0 is between 3.08 3.11 3.14 3.18 3.22 3.27 3.33 3.39 3.49 3.62 3.90 and and and and and and and and and and and 3.10 3.13 3.17 3.21 3.26 3.32 3.38 3.48 3.61 3.89 up P0 z z0 is .4990 .4991 .4992 .4993 .4994 .4995 .4996 .4997 .4998 .4999 .5000 2 252y0821 3/31/08 II. (5+ points) Do all the following. Look them over first – There is a section III in the in-class exam and the computer problem is at the end. Show your work where appropriate. There is a penalty for not doing Problem 1. Page 11 is left blank if you need more space for calculations. Note the following: 1. This test is normed on 50 points, but there are more points possible including the take-home. You are unlikely to finish the exam and might want to skip some questions. 2. A table identifying methods for comparing 2 samples is at the end of the exam. 3. If you answer ‘None of the above’ in any question, you should provide an alternative answer and explain why. You may receive credit for this even if you are wrong. 4. Use a 5% significance level unless the question says otherwise. 5. Read problems carefully. A problem that looks like a problem on another exam may be quite different. 6. Make sure that you state your null and alternative hypothesis, that I know what method you are using and what the conclusion is when you do a statistical test. Use a significance level of .05 unless you are told otherwise. 1. You wish to assess the stability of the price of a stock and you find closing prices for the last year. Rather than computing a variance of the entire population you take a sample of seven randomly picked closing prices and compute a sample standard deviation. The sample is below – compute the sample standard deviation. Show your work! (3) Row x 1 89 2 124 3 56 4 94 5 75 6 82 7 63 6 For your convenience the sum of the first six numbers in x is x 520 and the sum of the first six numbers i 1 6 squared is x 2 47618 . i 1 Solution: If you took advantage of the numbers that were given x 2 47618 63 47618 3969 51587 . x 520 63 583 and 2 If you wasted time by not using this freebie, the results are as below. Of course you should not bother with the last two columns. Row 1 2 3 4 5 6 7 x x2 89 124 56 94 75 82 63 583 7921 15376 3136 8836 5625 6724 3969 51587 xx 5.7143 40.7143 -27.2857 10.7143 -8.2857 -1.2857 -20.2857 0.0000 x x 2 32.65 1657.65 744.51 114.80 68.65 1.65 411.51 3031.43 If you used the computational formula, you got s x2 x 583 , x 51587 , x x 0 (a check) and x x 2 2 We thus have 3031 .43 . So the mean of x is x x 2 x 583 83.2857 . n nx 2 n 1 7 51587 783 .2857 2 3031 .4452 6 6 505 .2409 and s x 505 .2409 22 .4776 . If you wasted more time by using the definitional formula, you got s x2 x x n 1 2 3031 .43 505 .2383 and s x 505 .2383 22 .4775 . Minitab gets 6 s x2 505.238 and s x 22.48 . 3 252y0821 3/31/08 2) You wish to compare this stock against a second stock that your friend recommends. Your friend has taken a random sample of 10 closing prices and assures us that the sample mean price of this stock is 117.699 and the sample standard deviation is 55.2764. You don’t like your friend’s stock because 1) it has a larger variance, indicating that it is riskier and it costs more per share. The values you get are in the y column with z y being the y values with the 117.699 subtracted and the result divided by 55.2764. Compare the variances using a statistical test of the equality of variances. (2) zy y Row 1 2 3 4 5 6 7 8 9 10 78.48 130.93 93.17 105.37 69.50 85.43 102.84 259.27 151.17 100.83 -0.71 0.24 -0.44 -0.22 -0.87 -0.58 -0.27 2.56 0.61 -0.31 Solution: We have s x2 505.238 , df x 6 , s 2y 55 .2764 2 3055 .480 and df y 9 . From Table 3 we have the following for comparison of two variances on the assumption of Normality. Our hypotheses are H 0: 12 22 and H 1: 12 22 Interval for Confidence Hypotheses Test Ratio Critical Value Interval Ratio of Variances 22 s22 DF , DF s12 H0 : 12 22 DF1 , DF2 1 2 F F . 5 . 5 2 12 s12 s2 H : 2 2 1 , DF2 F1DF 2 1 FDF1 , DF2 2 1 DF1 n1 1 1 2 2 and DF2 n 2 1 F DF2 , DF1 2 .5 .5 2 or 1 2 As explained in class, we should compare the larger of the two ratios F s 22 s12 DFx , DFy against an appropriate value of F . The larger of the two ratios is F 9,6 2 s 2y s x2 s x2 s 2y or F DFy , DFx s 2y s x2 3055 .480 6.0476 . The 505 .238 9,6 5.52 , and since the computed part of the F table with df 2 6 is below. So F.025 F is larger than the table F , we reject the null hypothesis. Though we really should divide the standard deviation by the mean to get per-dollar risk, the second stock looks riskier. df 2 6 df1 1 2 * * 0.100 3.78 3.46 0.050 5.99 5.14 0.025 8.81 7.26 0.010 13.75 10.92 3 4 5 * 3.29 4.76 6.60 9.78 * 3.18 4.53 6.23 9.15 * 3.11 4.39 5.99 8.75 6 * 3.05 4.28 5.82 8.47 7 * 8 * 3.01 4.21 5.70 8.26 * 2.98 4.15 5.60 8.10 9 * 2.96 4.10 5.52 7.98 10 11 12 * 2.94 4.06 5.46 7.87 * 2.92 4.03 5.41 7.79 * 2.90 4.00 5.37 7.72 4 252y0821 3/31/08 The parts of Table 3 to be used in questions 3) and 5) follow. Interval for Confidence Hypotheses Interval Difference H 0 : D D0 * D d z 2 d between Two H 1 : D D0 , Means ( 12 22 D 1 2 d known) n1 n2 Same as H 0 : 1 2 d x1 x 2 Test Ratio z d D0 d Critical Value d cv D0 z d H 1 : 1 2 if D 0 0. Difference between Two Means ( unknown, variances assumed equal) D d t 2 s d Difference between Two Means( unknown, variances assumed unequal) D d t 2 s d sd s p H 0 : D D0 * 1 1 n1 n2 t H 1 : D D0 , D 1 2 sˆ 2p d D0 sd d cv D0 t 2 s d n1 1s12 n2 1s22 n1 n2 2 DF n1 n2 2 s12 s22 n1 n2 sd DF H 0 : D D0 * s12 s22 n 1 n2 t H 1 : D D0 , D 1 2 d D0 sd d cv D0 t 2 s d 2 s12 2 n1 n1 1 s 22 2 n2 n2 1 3) Are you sure that stock y has a higher average price than stock x ? Using the results of 2) compare the mean prices. If you do not assume equality of variances assume that you can use 14 degrees of freedom for the test. (3) Solution: We have x 83.2857 , s x2 505.238 , n x 7 , y 117.699 , s 2y 3055 .480 and n y 10 . H 0: x y and H 1: x y or, if D x y , H 0: D 0 and H 1: D 0 As a result of our test in 2), we cannot assume equal variances. So we can compute s d s12 s 22 505 .238 3055 .480 72 .1769 305 .5480 377 .7249 19 .43514 , n1 n 2 7 10 14 1.761 . We will use only one of the following three methods. d 83.286 117 .899 34.613 and t .05 Test Ratio: t d 0 34 .613 1.781 . since this is a left-sided test, our ‘reject’ region is all points sd 19 .43514 14 1.761 . Since t 1.781 is below -1.761, we (barely) reject H 0 . below t .05 Critical value for the difference between sample means: For a two-sided test d cv 0 t 14 s d , but this is a 2 left-sided test and, because the alternative hypothesis is H 1: D 0 , we need one critical value below zero d cv 0 t14 s d 1.76119 .43514 34 .225 Our ‘reject’ region is all points below -34.225. Since d 34 .613 is below our critical value, we reject H 0 . One-sided confidence interval: For a two-sided test D d t 14 s d , but this is a left-sided test and, 2 because the alternative hypothesis is H 1: D 0 , we need a one-sided confidence interval or lower limit 5 252y0821 3/31/08 D d t14 s d 34 .613 34 .225 0.338 . The interval D 0.338 and does not include zero, so it contradicts H 0: D 0 , and we reject H 0 . 4) Test stock y to see if it has a Normal distribution. How do your results from the test of Normality affect your assessment of the results in 2) and 3)? (4) Solution: we can set this up as a Lilliefors test if we put the numbers in order. All probabilities come from the standardized Normal table. Since each number is considered a group the O column is all ones. cum O is a running sum of the O column . Fo is the cum O column divided by n 10 . Row 1 2 3 4 5 6 7 8 9 10 Sum zy y 78.48 130.93 93.17 105.37 69.50 85.43 102.84 259.27 151.17 100.83 z y in order Fe P z z y -0.71 0.24 -0.44 -0.22 -0.87 -0.58 -0.27 2.56 0.61 -0.31 -0.87 -0.71 -0.58 -0.44 -0.31 -0.27 -0.22 0.24 0.61 2.56 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 + + + .3078 .2611 .2190 .1700 .1217 .1064 .0871 .0948 .2291 .4948 = = = = = = = = = = O cum O Fo D Fo Fe .1922 .2389 .2810 .3300 .3783 .3935 .4129 .5948 .7291 .9948 1 1 .1000 1 2 .2000 1 3 .3000 1 4 .4000 1 5 .5000 1 6 .6000 1 7 .7000 1 8 .8000 1 9 .9000 1 10 1.0000 10 n .0922 .0389 .0190 .0700 .1217 .2065 .2871 .2014 .1709 .0052 The relevant part of the new Lilliefors table appears below. We reject our null hypothesis of Normality if the largest number in the D Fo Fe exceeds the number from the Lilliefors table. TABLE 11: Critical Values for the Lilliefors Test n .20 .15 .10 .05 .01 4 5 .3027 .2893 .3216 .3027 .3456 .3188 .3754 .3427 .4129 .3959 6 7 8 9 10 .2694 .2521 .2387 .2273 .2171 .2816 .2641 .2502 .2382 .2273 .2982 .2802 .2649 .2522 .2410 .3245 .3041 .2825 .2744 .2616 .3728 .3504 .3331 .3162 .3037 The maximum discrepancy between the Observed and Expected cumulative distributions is .2871. This exceeds the table value of .2410. 5) Using the sample means and standard deviations you found in 2) and 3) but assuming that both samples are of size 100 and come from a Normal distribution, do an 11% confidence interval for the difference between the means. (2) [14] Solution: We have x 83.2857 , s x2 505.238 , n x 100 , y 117.699 , s 2y 3055 .480 and n y 100 . H 0: x y and H 1: x y or, if D x y , H 0: D 0 and H 1: D 0 As a result of our test in 2), we cannot assume equal variances. So we can compute s d s12 s 22 505 .238 3055 .480 5.05238 30 .55480 35 .60718 5.96718 , n1 n 2 100 100 d 83.286 117 .899 34.613 and this is the large sample case, so use z .05 1.645 . We will use only one of the following three methods. d 0 34 .613 Test Ratio: z 5.801 . since this is a left-sided test, our ‘reject’ region is all points sd 5.96718 below z .05 1.645 . Since t 5.801 is below -1.645, we (definitely) reject H 0 . 6 252y0821 3/31/08 Critical value for the difference between sample means: For a two-sided test d cv 0 z s d , but this is a 2 left-sided test and, because the alternative hypothesis is H 1: D 0 , we need one critical value below zero d cv 0 z s d 1.645 5.96718 9.816 Our ‘reject’ region is all points below -9.816. Since d 34 .613 is below our critical value, we reject H 0 . One-sided confidence interval: For a two-sided test D d z 2 s d , but this is a left-sided test and, because the alternative hypothesis is H 1: D 0 , we need a one-sided confidence interval or lower limit D d z s d 34 .613 9.816 24 .797 . The interval D 24.797 and does not include zero, so it contradicts H 0: D 0 , and we reject H 0 . 7 252y0821 3/31/08 III. (18+ points) Do as many of the following as you can. (2 points each unless noted otherwise). Look them over first – the computer problem is at the beginning. Show your work where appropriate. 1. Computer question. a) Turn in your first computer output. Only do b, c and d if you did. (3) b) (Meyer and Krueger) A corporation rents apartments within the city of Phoenix and in the surrounding suburbs. It wishes to verify that the mean rent in the city is lower than in the suburbs. Two independent random samples are taken. These appear below. City 401.84 666.95 804.01 611.09 Suburb 458.98 994.09 810.44 764.69 815.86 755.37 715.30 314.14 584.52 650.46 904.77 587.72 870.44 970.26 639.96 657.92 403.64 617.37 506.58 695.45 752.60 735.26 444.47 567.60 574.94 538.26 313.08 752.66 398.33 732.83 667.61 762.35 670.29 458.07 396.20 656.04 676.23 364.37 953.06 728.25 187.23 878.82 720.20 745.79 793.68 764.80 879.91 737.99 566.75 279.74 918.40 654.05 841.70 648.31 1106.17 919.93 (i) What are your null and alternative hypotheses? Solution: The alternate hypothesis is H 1: 1 2 so the null hypothesis is H 0: 1 2 or, if D 1 2 , H 0: D 0 and H 1: D 0 . So we have a left-sided test. (ii) Three tests appear below – Which is correct for your null hypotheses? (3) .01 Test 1. MTB > TwoSample c1 c2; SUBC> Confidence 99.0. Two-Sample T-Test and CI: City, Suburb Two-sample T for City vs Suburb SE N Mean StDev Mean City 30 590 169 31 Suburb 30 743 189 34 Difference = mu (City) - mu (Suburb) Estimate for difference: -153.0 99% CI for difference: (-276.2, -29.7) T-Test of difference = 0 (vs not =): T-Value = -3.31 DF = 57 P-Value = 0.002 Test 2. MTB > TwoSample c1 c2; SUBC> Confidence 99.0; SUBC> Alternative -1. Two-Sample T-Test and CI: City, Suburb Two-sample T for City vs Suburb SE N Mean StDev Mean City 30 590 169 31 Suburb 30 743 189 34 Difference = mu (City) - mu (Suburb) Estimate for difference: -153.0 99% upper bound for difference: -42.3 T-Test of difference = 0 (vs <): T-Value = -3.31 P-Value = 0.001 DF = 57 Solution: Test 2 says that the alternative hypothesis is H 1: 1 2 , so that’s what we want. In our language the null hypothesis is H 0: 1 2 . 8 252y0821 3/31/08 Test 3. MTB > TwoSample c1 c2; SUBC> Confidence 99.0; SUBC> Alternative 1. Two-Sample T-Test and CI: City, Suburb Two-sample T for City vs Suburb SE N Mean StDev Mean City 30 590 169 31 Suburb 30 743 189 34 Difference = mu (City) - mu (Suburb) Estimate for difference: -153.0 99% lower bound for difference: -263.7 T-Test of difference = 0 (vs >): T-Value = -3.31 P-Value = 0.999 DF = 57 c) From the output, but using the correct format for a confidence interval, what is an appropriate confidence interval to test your hypotheses? (2) Solution: The alternative hypothesis is H 1: 1 2 , so the appropriate confidence interval (or upper limit)l given in Test 2 is ‘99% upper bound for difference: -42.3.‘ In our format that would be D 42 .3 or 1 2 42 .3 d) What is your conclusion? Why? ( .01 ) (2) Test 2 says ‘P-Value = 0.002.’ So we reject the null hypothesis H 0: 1 2 because the p-value is below the significance level. e) What method was used by the computer? D1, D2, D3, D4, D5a, D5b, D6a, D6b, D7? (1) Explanation: As explained in class this is always the default method for comparing two means. f) The following tests were run after the original hypotheses tests. What do they tell us about the appropriateness of the method? Why? (1) [12] MTB > NormTest c1; SUBC> KSTest. Probability Plot of City MTB > NormTest c2; SUBC> KSTest. Probability Plot of Suburb Solution: On both the plots generated on the next page the result of the Lilliefors (Kolmogorov-Smirnov) test for normality is ‘p-value > 0.150.’ So, if we are still using a 1% significance level (or any other common significance level), we cannot reject the null hypothesis of Normality. The t-tests we have used are dependent on an assumption of Normality. (Actually, we could use a so-called ‘Fat-Pencil Test’ on the data and notice that the points do appear close to a straight line, which also attests Normality.) 9 252y0821 3/31/08 2. A ‘robust’ test procedure is one that a) Can only be done with a computer b) Requires an underlying Normal distribution c) Is sensitive to slight violations of its assumptions. d) *Is insensitive to slight violations of its assumptions. 3. (Ng -219, 18) Assume that you have the following information: s12 4 , s 22 6 , n1 16 and n 2 25 and you wish to do a pooled-variance t test, your ŝ p and degrees of freedom are (3) [17] a) 2.45, 41 b) 2.24, 41 c) 2.29, 41 d) 2.00, 41 e) 2.45, 39 f) 2.24, 39 g)* 2.29, 39 h) 2.00, 39 i) 2.45, 16 j) 2.24, 16 k) 2.29, 16 l) 2.00, 16 m) 2.45, 25 n) 2.24, 25 o) 2.29, 25 p) 2.00, 25 e) It’s more appropriate to add standard errors and use z 2 n2 1s 22 154 246 204 n 1 s 1 Explanation: The formula for a pooled variance is sˆ 2p 1 n1 n 2 1 16 25 2 39 5.231 . s p 5.231 2.29 with 3 significant figures. n1 n 2 2 16 25 2 39 . 10 252y0821 3/31/08 4. If I want to test to see if the mean of x1 is smaller than the mean of x 2 my null hypothesis is: (Note: D 1 2 ) Only check one answer! (2) a) 1 2 and D 0 b) 1 2 and D 0 e) * 1 2 and D 0 f) 1 2 and D 0 c) 1 2 and D 0 d) 1 2 and D 0 g) 1 2 and D 0 h) 1 2 and D 0 Explanation: The alternate hypothesis is H 1 : 1 2 , which is the same as saying H 1 : D 0 . The null hypothesis must contain an equality, so it will read H 0 : 1 2 or H 1 : D 0 5. Consumers are asked to take the Pepsi Challenge. They were asked they which cola they preferred and the number that preferred Pepsi was recorded. Sample 1 was males and sample 2 was females. The following was run on Minitab. [19] MTB > PTwo 109 46 52 13; SUBC> Pooled. Test and CI for Two Proportions Sample X N Sample p 1 46 109 0.422018 2 13 52 0.250000 Difference = p (1) - p (2) Estimate for difference: 0.172018 95% CI for difference: (0.0221925, 0.321844) Test for difference = 0 (vs not = 0): Z = 2.12 P-Value = 0.034 On the basis of the printout above we can say one of the following. a) At a 99% confidence level we can say that we have enough evidence to state that the proportion of men that prefer Pepsi differs from the proportion of women that prefer Pepsi b) *At a 95% confidence level we can say that we have enough evidence to state that the proportion of men that prefer Pepsi differs from the proportion of women that prefer Pepsi c) At a 99% confidence level we can say that we have enough evidence to state that the proportion of men that prefer Pepsi equals the proportion of women that prefer Pepsi. d) At a 96% confidence level there is insufficient evidence to indicate that the proportion of men that prefer Pepsi differs from the proportion of women that prefer Pepsi Explanation: We have a choice of a 99% confidence interval and a 1% significance level or a 95% confidence level and a 5% significance level. Since the p-value is .034, we reject the null hypothesis at the 95% confidence level but not at the 99% confidence interval. The null hypothesis is equal proportions. 11 252y0821 3/31/08 6. (Lenzi) A group of runners run a 100 meter dash before and after running a marathon. Their times are shown below. Pat – how long did they wait before running the second dash? Row 1 2 3 4 5 6 7 Before 12.4 11.8 12.5 12.0 11.5 11.2 12.9 After 12.6 12.2 12.4 12.7 12.0 11.8 12.7 d -0.2 -0.4 0.1 -0.7 -0.5 -0.6 0.2 Minitab printed out the following statistics. Variable Before After d N 7 7 7 N* Mean 0 12.043 0 12.343 0 -0.300 SE Mean 0.226 0.134 0.131 StDev 0.597 0.355 0.346 Minimum 11.200 11.800 -0.700 Q1 11.500 12.000 -0.600 Median 12.000 12.400 -0.400 Q3 Maximum 12.500 12.900 12.700 12.700 0.100 0.200 Can we show that they were slower after the marathon? a) How many degrees of freedom do we have in this problem? (1) Solution: This is a paired data problem (D4) and we have 7 before and after pairs, so DF 6 . Table 3 says the following. Interval for Confidence Hypotheses Test Ratio Critical Value Interval Difference H 0 : D D0 * D d t 2 s d d cv D0 t 2 s d d D0 t between Two H 1 : D D0 , sd d x1 x 2 Means (paired D 1 2 data.) s df n 1 where sd d Same as n1 n 2 n n H 0 : 1 2 H 1 : 1 2 if D 0 0. b) What are our null and alternative hypotheses? (1) Solution: The alternate hypothesis is H 1 : 1 2 or H 1 : D 0 . So the null hypothesis, which must contain an equality, is H 0 : 1 2 or H 0 : D 0 . c) What is the approximate p-value for our result? (3) Show your work! (2 points if you do not do a p-value) d 0 Solution: The formula for a test ratio is t , and the printout says d 0.300 and sd sd 0.346 0.131 . So t d 0 0.300 2.2900 . The part of the t table for 6 degrees of freedom is sd 0.131 7 below. Since this is a left-sided test, p value Pt 2.29 Significance Level df .45 .40 .35 .30 .25 .20 .15 .10 .05 .025 .01 .005 .001 6 0.131 0.265 0.404 0.553 0.718 0.906 1.134 1.440 1.943 2.447 3.143 3.707 5.208 6 6 1.943 2.2900 2.447 t .025 Since t .05 , we can conclude that .025 Pt 2.290 .05 and, because this is a left-sided test, .025 p value Pt 2.290 .05 . d) On the basis of your p-value, what is our conclusion if the confidence level is 95%? Why? (1) Solution: If the confidence level is 95%, the significance level is 5% and we reject the null hypothesis because the p-value is below the significance level. 12 252y0821 3/31/08 e) What if the confidence level is 99%? Why? (You do not need a p-value to answer this part of the question, though is would help.) (1) [26] Solution: If the confidence level is 99%, the significance level is 1% and we cannot reject the null hypothesis because the p-value is not below the significance level. 7. A researcher takes independent random samples salaries of 18 women (Sample 1) and 18 men (Sample 2) who are fairly recent Business graduates with the following results. Row Women Men Difference Minitab gives the following statistics. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 64709 47105 28972 31449 42574 59051 26838 56651 64929 57497 38290 67106 67280 40826 60826 46207 58976 45809 40824 54465 68433 54941 54050 53043 45680 40399 57584 78224 53722 34915 59636 50499 53502 77186 60208 48381 23885 -7360 -39461 -23492 -11476 6008 -18842 16252 7345 -20727 -15432 32191 7644 -9673 7324 -30979 -1232 -2572 Descriptive Statistics: Women, Men, d Variable Women Men d N 18 18 18 N* Mean SE Mean 0 50283 3156 0 54761 2708 0 -4478 4462 StDev 13392 11488 18929 Test the statement that women have a significantly lower salary than men. The relevant part of Table 3 appears on page 5. a) What are your null and alternative hypotheses? (1) Solution: Here we go again with the same old stuff. ) The alternate hypothesis is H 1 : 1 2 or H 1 : D 0 . So the null hypothesis, which must contain an equality, is H 0 : 1 2 or H 0 : D 0 . b) (Extra Credit) If you do not assume that variances of the two samples are equal (i) How many degrees of freedom do you have? (4) Solution: So now we are stuck with the Satterthwaite approximation. Let’s start with s12 13392 2 or 3156 2 9960336 9963648 18 n1 s 22 n2 11488 2 7331896 .889 18 s12 s 22 n1 n 2 df s12 s 22 n1 n 2 2 17295544.889 2 s12 s 22 n1 n 2 2 or 17293600 2 s12 s 22 n1 n2 n1 1 n2 1 or df or 2708 2 7333264 17295544 .889 2 9963648 17 2 7331896 .889 2 17 17293600 2.9914587 10 14 5.8396636 10 12 3.1621595 10 12 33 .23059 2 2 s12 s 22 n1 n2 n1 1 n2 1 9960336 17 2 2 7333264 17 2 2.9906860 10 14 5.8357820 10 12 3.16333389 10 12 33 .23309 13 252y0821 3/31/08 It’s a good thing that I had two versions of the standard errors from Minitab, because I got something like 55 on the first try. These both should be rounded down to 33. Incidentally, this calculation was done quite rapidly by saving the three items in the last ratio in my 25-year-old calculator’s storage. d D0 (ii) If you use the formula t , what is the value of s d ? (2) sd Solution: s d s2 s2 s12 s 22 . We had 1 2 17295544.889 or 17293600 so the square root is 4158.79 or n1 n 2 n1 n 2 4158.56. (iii) Compute the t ratio and test the hypothesis, clearly stating your conclusions .05 (2) Solution: t d D0 sd 4478 0 1.077 . We were testing H 0 : 1 2 or H 0 : D 0 against 4158 .7 H 1 : 1 2 or H 1 : D 0 , so if df 33 , we will reject the null hypothesis if the computed t ration is 33 1.692 . Since 1.077 is not below -1.692, we cannot reject the null hypothesis. below t .05 c) If you assume that variances of the two samples are equal (i) How many degrees of freedom do you have? (1) Solution: Let’s repeat our previous data. Descriptive Statistics: Women, Men, d Variable Women Men d N 18 18 18 N* Mean SE Mean 0 50283 3156 0 54761 2708 0 -4478 4462 StDev 13392 11488 18929 Degrees of freedom are n1 n2 2 18 18 2 34 . Notice how close this is to the previous calculation. (ii) If you use the formula t d D0 , what is the value of s d ? (3) sd Solution: The formula for a pooled variance is n 1s12 n2 1s 22 17 13392 2 17 11488 2 13392 2 11488 2 sˆ 2p 1 155659904 . n1 n 2 1 18 18 2 2 s p 1556599041 12476.374 . 1 1 1 1 155659904 17295544 .89 4158 .7913 s d sˆ 2p n n 18 18 2 1 (iii) Compute the t ratio and test the hypothesis, clearly stating your conclusions .05 (2) [32] Solution: t d D0 sd 4478 0 1.077 . We were testing H 0 : 1 2 or H 0 : D 0 against 4158 .7 H 1 : 1 2 or H 1 : D 0 , so if df 34 , we will reject the null hypothesis if the computed t ration is 34 1.691 . Since 1.077 is not below -1.691, we cannot reject the null hypothesis. below t .05 14 252y0821 3/31/08 For the curious, here is the computer output for the two versions of the test. Two-Sample T-Test and CI: Women, Men Two-sample T for Women vs Men N Mean StDev SE Mean Women 18 50283 13392 3156 Men 18 54761 11488 2708 Difference = mu (Women) - mu (Men) Estimate for difference: -4478 95% upper bound for difference: 2561 T-Test of difference = 0 (vs <): T-Value = -1.08 DF = 33 P-Value = 0.145 MTB > TwoSample c1 c2; SUBC> Pooled; SUBC> Alternative -1. Two-Sample T-Test and CI: Women, Men Two-sample T for Women vs Men N Mean StDev SE Mean Women 18 50283 13392 3156 Men 18 54761 11488 2708 Difference = mu (Women) - mu (Men) Estimate for difference: -4478 95% upper bound for difference: 2555 T-Test of difference = 0 (vs <): T-Value = -1.08 DF = 34 Both use Pooled StDev = 12476.2829 P-Value = 0.145 15 252y0821 3/31/08 8. (Meyer and Krueger again) Back to the Phoenix problem. The people in the problem on page 3 are still obsessing over the relationship of rents to whether an apartment is urban or suburban. The computer output from a Chi-Squared test is below. Results for: 251x0821-06.MTW MTB > WSave "C:\Documents and Settings\RBOVE\My Documents\Minitab\251x0821-06.MTW"; SUBC> Replace. Saving file as: 'C:\Documents and Settings\RBOVE\My Documents\Minitab\251x0821-06.MTW' MTB > print c1-c3 Data Display Row Rent City 1 <500 48 2 500-599 51 3 600-699 30 4 700 up 22 Suburb 2 11 17 19 MTB > ChiSquare c2 c3. Chi-Square Test: City, Suburb Expected counts are printed below observed counts Chi-Square contributions are printed below expected counts City 48 37.75 2.783 Suburb 2 12.25 8.577 Total 50 2 51 46.81 0.375 11 15.19 1.156 62 3 30 35.48 0.848 17 11.52 2.613 47 4 22 30.95 2.591 19 10.05 7.983 41 Total 151 49 200 1 Chi-Sq = 26.925, DF = 3, P-Value = 0.000 a) The above is a Chi-squared test of (1) i) *independence ii) homogeneity iii) goodness-of-fit iv) none of the above b) What is the null hypothesis of this test and, assuming a 95% confidence level, what is the conclusion? (2) [35] Solution: The null hypothesis is that rents and location are independent. Because the p-value is extremely low (below 5%), we reject the null hypothesis. 16 252y0821 3/31/08 9. Back to Phoenix again. The people in the previous Phoenix problems are now sure that the distribution of rents is not Normal but skewed to the right. They select a random sample of 10 rents in the city and another random sample of 10 rents in the suburbs (1-City, 2-Suburb). The researchers now believe that rentals in the city are lower than rentals in the country. The researchers will do the following test. a) T-test of paired data b) Wilcoxon signed rank test c) T-test of means of independent samples d) *Wilcoxon-Mann-Whitney test e) None of the above 10. Assuming that a rank test of some sort is done in Problem 9, what will be our null hypothesis, and, assuming that the smaller of the two sums of ranks is 44 and that we are working with a 95% confidence level, what will be our conclusion and why? (3) [40] Solution: This is a one-sided test, ( H 0: 1 2 and H 1 : 1 2 ) so that if we work with the appropriate table, we have the part of Table 6 below. We will reject the hypothesis of equal medians if TL 44 is below 69. It is. We do. TABLE 6: Critical values of the Rank Sum for the Mann-Whitney-Wilcoxon Rank Sum Test for Independent Samples. Table 6b: .05 for a 1-tailed test or .10 for a 2-tailed test. n1 n2 3 4 5 6 7 8 9 10 11 12 3 4 5 6 7 8 9 10 11 12 -6 18 7 20 8 22 8 25 9 27 9 30 10 32 11 34 11 37 6 18 11 25 12 28 13 31 14 34 15 37 16 40 17 43 18 46 19 49 7 20 12 28 19 36 20 40 21 44 23 47 24 51 26 54 27 58 28 62 8 22 13 31 20 40 28 50 29 55 31 59 33 63 35 67 37 71 38 76 8 25 14 34 21 44 29 55 39 66 41 71 43 76 45 81 47 86 49 91 9 27 15 37 23 47 31 59 41 71 51 85 54 90 56 96 59 101 62 106 9 30 16 40 24 51 33 63 43 76 54 90 66 105 69 111 72 117 75 123 10 32 17 43 26 54 35 67 45 81 56 96 69 111 82 128 86 134 89 141 11 34 18 46 27 58 37 71 47 86 59 101 72 117 86 134 100 153 104 160 11 37 19 49 28 62 38 76 49 91 62 106 75 123 89 141 104 160 120 180 17 252y0821 3/31/08 ECO252 QBA2 SECOND EXAM March 28, 2008 TAKE HOME SECTION Name: _________________________ Student Number: _________________________ Class hours registered and attended (if different):_________________________ IV. Neatness Counts! Show your work! Always state your hypotheses and conclusions clearly. (19+ points). In each section state clearly what number you are using to personalize data. There is a penalty for failing to include your student number on this page, not clarifying version number in each section and not including class hour somewhere. Please write on only one side of the paper. Be prepared to turn in your Minitab output for the first computer problem and to answer the questions on the problem sheet about it or a similar problem. 1. (Moore, McCabe et. al.) A large public university took a survey of 865 students to find out if there was a relationship between the chosen major and whether the students had student loans. The students’ majors were categorized as Agriculture, Child Development, a be the second31 a and the number of business majors Engineering, Liberal Arts, Business, Science and Technology. Before you start personalize the data as follows. Let to-last digit of your student number. Change the number of Science majors with loans to who have loans to 24 a for every part of this problem. The total number of students in the survey will not change. Put your version of the table below on top of the first page of your solution. Use a 99% confidence level in this problem. Ag Ch Engg Lib Bus Sci Tech Loan 32 37 98 89 24 31 57 None 35 50 137 124 51 29 71 a) Compute the proportion of non-science majors that have loans in order to test the hypothesis that science majors are more likely to have loans than other majors. Tell which group you consider sample 1. State H 0 and H 1 in terms of the proportions involved and also in terms of the difference between the proportions, explaining whether this difference is a statistic from sample 1 minus a statistic from sample 2 or the reverse. (1) b) Use a test ratio to test your hypotheses from a) (2) c) Use a critical value for the difference between proportions to test your hypotheses from a) (2) d) Use an appropriate confidence interval to test your hypotheses from a) (2) e) Treat each major separately and test the hypothesis that the proportion of students that have loans is independent of major (4) f) If you did section 1e, follow your analysis with a Marascuilo procedure to compare the proportion of business students that have loans with the proportions for the other 6 majors. Tell which differences are significant. (3) [14] g) (Extra credit) Check your results using Minitab. (i) To do a chi-squared test on an O table that is in Columns c22-c28, simply put the row labels in Column c21 and print out your data. Then type in ChiSquare c22 – c28. The computer will print back the columns with their names, but below each number from the O table you will find the corresponding values of E and O E 2 E , the contribution of the value of O to the chi-square total. Use the p-value to find out if we reject the hypothesis of equal proportions at the 1% significance level. (ii) To do a test of the alternative hypothesis H 1 : p1 p 2 , where p1 x1 x2 and p 2 , use the command n1 n2 x1 , n1 , x 2 and n 2 . MTB > PTwo x1 n1 x 2 n 2 ; below, substituting your numbers for SUBC> SUBC> SUBC> The computer will print back Confidence 99.0; Alternative 1; Pooled. x1 , n1 , p1 x1 x2 , x 2 , n 2 and p 2 a p-value for a z-test and Fisher’s exact test (results n1 n2 should be somewhat similar to the z-test) and a 1-sided 99% confidence interval. 2. (Moore, McCabe et. al) An absolutely tactless psychology professor has divided faculty members into categories the professor labels ‘Fat’ and ‘Fit’. A random sample of scores on a test of ‘ego strength’ of the ‘Fat’ faculty is labeled x1 . A sample of ‘ego strength’ of the ‘Fit’ faculty is labeled x 2 . d x1 x 2 . Use a 95% confidence level in this problem. 18 252y0821 3/31/08 The professor has computed scores = 64.96, x 1 Sum of Fat x12 Sum of squares of Fat scores = 307.607, x x 2 Sum 2 2 Sum of squares of Fit scores = of scores of Fit = 90.02, 581.239, d Sum of diff = -25.06 and d Sum of squares of diff = 51.8198. 2 Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Fat Fit x1 x2 4.99 4.24 4.74 4.93 4.16 5.53 4.12 5.10 4.47 5.30 3.12 3.77 5.09 5.40 6.68 6.42 7.32 6.38 6.16 5.93 7.08 6.37 6.53 6.68 5.71 6.20 6.04 6.52 Diff d x1 x 2 -1.69 -2.18 -2.58 -1.45 -2.00 -0.40 -2.96 -1.27 -2.06 -1.38 -2.59 -2.43 -0.95 -1.12 b , where b is the last digit of your student number. Please state clearly what row you removed. n1 n 2 13 rows of data. You will need the mean and variance of all three columns of data if you do To personalize the data remove row At this point you will have all sections of this problem. You can save yourself considerable effort by using the computational formula for the variance with the sums and sums of squares that the professor computed with the value or value squared of the numbers you removed subtracted. The professor got the following results. Variable n Mean SE Mean StDev Median Fat 14 4.640 0.184 0.690 4.835 Fit 14 6.430 0.115 0.431 6.400 diff 14 -1.790 0.196 0.732 -1.845 Your results should be relatively similar. Credit for computing the sample statistics needed is included in the relevant parts of this problem. State hypotheses and conclusions clearly in each segment of the problem. a) Assume that x1 and x 2 are independent random samples and test the hypothesis that the population mean of the ego strength of the ‘fit’ faculty is above the population mean of the ‘fat’ faculty. Assume that the data comes from the Normal distribution and that the variances for the ‘fit’ and ‘fat’ populations are similar. (3) b) (Extra credit) Assume that x1 and x 2 are independent random samples and test the hypothesis that the population mean of the ego strength of the ‘fit’ faculty is above the population mean of the ‘fat’ faculty. Assume that the data comes from the Normal distribution and that the variances for the ‘fit’ and ‘fat’ populations are not similar. (3) c) Assume that x1 and x 2 are independent random samples. How would we decide whether the method in a) of b) is correct? Do the appropriate test. Assume that the data comes from the Normal distribution. Should we have used a) or b)? (2) [22] d) Compute the mean and variance of the column of differences and test the column to see if the Normal distribution works for these data. (4) e) Assume that we had rejected the hypothesis that the distributions in the populations that the columns come from is Normal, do a one-sided test to see whether the ego strength of the ‘Fat and ‘Fit’ people differs. (2) f) In the remainder of this problem assume that the x1 and x 2 columns are not independent random samples but instead represent the ego strength of the same 14 or 13 faculty members before and after a fitness program. Assuming that the Normal distribution applies, can we say that the ego strength of the faculty has increased? (2) g) Repeat f) under the assumption that the Normal distribution does not apply. (1) h) Use the Wilcoxon signed rank test, to test to see if the median of the d column is -2. (2) [35] i) Extra credit. Use Minitab to check your work. The commands that you might need are as follows – remember that the subcommand ’Alternative -1’ gives a left-sided test and ’Alternative +1’ gives a right sided test. If this subcommand is not used a 2-sided test will appear. The basic command to compare two means for data in c2 and c3 is MTB > TwoSample c2 c3. This will produce a 2-sided test using Method D3. A semicolon followed by the Alterative subcommand will produce a 1-sided test. Adding the subcommand ’Pooled’ switches the method to D2. Remember that a semicolon tells Minitab that a subcommand is coming and a period tells Minitab that the command is complete. To use Method C4 on the same two columns use the command MTB > Paired c2 c3. This also can be modified with the Alternative command. To test C2 for Normality using a Lilliefors test use MTB > NormTest c4; SUBC> KSTest. There are two other tests for Normality baked into Minitab. These are the Anderson-Darling test and the Ryan-Joiner test. The graph produced by any of these can be analyzed by the Fat Pencil Test. To get a basic explanation of these tests use the Stat pull-down menu hit basic statistics and then Normality Test. Finally hit ‘help’ and investigate the topics available. There will be a small bonus for those of you who mention Minitab’s problems with English grammar. To use the Anderson-Darling test, use the NormTest command without a subcommand. To use the Ryan-Joiner test use 19 252y0821 3/31/08 MTB > NormTest c4; SUBC> RJTest. A really impressive paper might compare the results of the 3 tests and then show the results of an internet search on the differences between them. The other two tests that are relevant here can be accessed by using the Stat pull-down menu and the Nonparametrics option. The instruction for a left-sided (Wilcoxon)-Mann-Whitney test would be MTB > Mann-Whitney 95.0 c2 c3; SUBC> Alternative -1. Minitab’s instructions for a 2-sided Wilcoxon signed rank test of a median of -2 from one sample in C4 would be MTB > WTest -2 c4. To do a one-sided test comparing samples in two columns take d x1 x 2 and do a test that the median of d is zero. Again Alternative can be used to get a 1-sided test. Also there is some advice from last term’s Take-home. To fake computation of a sample variance or standard deviation of the data in column c1 using column c2 for the squares, MTB > let C2 = C1*C1 * performs multiplication MTB > name k1 'sum' ** would do a power, but multiplication MTB > name k2 'sumsq' is more accurate. MTB > let k1 = sum(c1) MTB > let k2 = sum(c2) This is equivalent to let k2 = ssq(c1) MTB > print k1 k2 Data Display This is a progress report for my data sum 3047.24 set. sumsq 468657 MTB > name k1 'meanx' MTB > let k1 = k1/count(c1) /means division. Count gives n. MTB > let k2 = k2 - (count(c1))*k1*k1 MTB > print k1 k2 Data Display meanx 152.362 sumsq 4372.53 MTB > name k2 'varx' MTB > let k2 = k2/((count(c1))-1) MTB > print k1 k2 Data Display meanx 152.362 varx 230.133 MTB > name k2 'stdevx' MTB > let k2 = sqrt(k2) MTB > print k1 k2 Data Display meanx 152.362 stdevx 15.1701 Sqrt gives a square root. Print C1, C2 To check for equal variances for data in C1 and C2, use MTB > VarTest c1 c2; SUBC> Unstacked. Both an F test and a Levine test will be run. The Levine test is for non-Normal data so you want the F test results. To check your mean and standard deviation, use ` MTB > describe C1 To put a items in column C1 in order in column C2, use MTB > Sort c1 c2; SUBC> By c1. 3. Sorry. This is all I’ve got. 20 252y0821 3/31/08 1. (Moore, McCabe et. al.) A large public university took a survey of 865 students to find out if there was a relationship between the chosen major and whether the students had student loans. The students’ majors were categorized as Agriculture, Child Development, Engineering, Liberal Arts, Business, Science and Technology. Before you start, personalize the data as follows. Let a be the second-to-last digit of your student number. Change the number of Science majors with loans to 31 a and the number of business majors who have loans to 24 a for every part of this problem. The total number of students in the survey will not change. Put your version of the table below on top of the first page of your solution. Use a 99% confidence level in this problem. Ag 32 35 Loan None Ch 37 50 Engg 98 137 Lib 89 124 Bus 24 51 Sci 31 29 Tech 57 71 a) Compute the proportion of non-science majors that have loans in order to test the hypothesis that science majors are more likely to have loans than other majors. Tell which group you consider sample 1. State H 0 and H 1 in terms of the proportions involved and also in terms of the difference between the proportions, explaining whether this difference is a statistic from sample 1 minus a statistic from sample 2 or the reverse. (1) Solution: Let’s call science group 1. Our alternative hypothesis is now H 1 : p1 p 2 or, if p p1 p 2 H 1 : p 0 . Accordingly, our null hypothesis is H 0 : p1 p 2 or H 0 : p 0 . It’s time to quote Table 3. Interval for Confidence Hypotheses Test Ratio Critical Value Interval pcv p0 z 2 p Difference p p 0 p p z 2 s p H 0 : p p 0 z between If p 0 p H 1 : p p 0 p p1 p 2 proportions 1 1 If p 0 p p 0 q 0 p 0 p 01 p 02 p1 q1 p 2 q 2 q 1 p s p n1 n 2 p 01q 01 p 02 q 02 p n1 n2 or p 0 0 n1 n2 n p n2 p 2 p0 1 1 Or use s p n1 n 2 Version 0: If a 0 , our table is as below. Row 1 2 3 C1 Loan None Col Total Ag 32 35 67 Ch 37 50 87 Engg 98 137 235 Lib 89 124 213 Bus 24 51 75 Sci 31 29 60 Tech 57 71 128 Total 368 497 865 Out of 865-60 = 805 non-science majors, 368 – 31 = 337 have loans. This is p 2 60 science majors p1 Row 1 2 3 4 5 6 7 Labels Ag Ch Engg Lib Bus Sci Tech Loan 32 37 98 89 24 31 57 337 .418634 . For the 805 31 .516667 . 60 None 35 50 137 124 51 29 71 Col Total 67 87 235 213 75 60 128 %Loan 0.477612 0.425287 0.417021 0.417840 0.320000 0.516667 0.445313 Version 9a: If a 9 , our table is as below. Row 1 2 3 C1 Loan None Col Total Ag 32 35 67 Ch 37 50 87 Engg 98 137 235 Lib 89 124 213 Bus 15 51 66 Sci 40 29 69 Tech 57 71 128 Total 368 497 865 21 252y0821 3/31/08 Out of 865-69 = 796 non-science majors, 368 – 40 = 328 have loans. This is p 2 69 science majors p 2 Row 1 2 3 4 5 6 7 Labels Ag Ch Engg Lib Bus Sci Tech Loan 32 37 98 89 15 40 57 328 .412060 . For the 796 40 .579710 . 69 None 35 50 137 124 51 40 71 Col Total 67 87 235 213 66 69 128 %Loan 0.477612 0.425287 0.417021 0.417840 0.227273 0.579710 0.445313 Version 9b: If a 9 , and you held the total number in each major constant, you got the table below. Row 1 2 3 C1 Loan None Col Total Ag 32 35 67 Ch 37 50 87 Engg 98 137 235 Lib 89 124 213 Bus 15 60 75 Sci 40 20 60 Tech 57 71 128 Total 368 497 865 Out of 865-60 = 705 non-science majors, 368 – 40 = 328 have loans. This is p 2 69 science majors p 2 Row 1 2 3 4 5 6 7 Labels Ag Ch Engg Lib Bus Sci Tech Loan 32 37 98 89 15 40 57 328 .465248 . For the 705 40 .666667 . 60 None 35 50 137 124 60 20 71 Col Total 67 87 235 213 75 60 128 %Loan 0.477612 0.425287 0.417021 0.417840 0.200000 0.666667 0.445313 b) Use a test ratio to test your hypotheses from a) (2) H 0 : p1 p 2 or H 0 : p 0 and H 1 : p1 p 2 or H 1 : p 0 . Version 0: If a 0 , our table is as below. Row 1 2 3 C1 Loan None Col Total Ag 32 35 67 Ch 37 50 87 Engg 98 137 235 Lib 89 124 213 Bus 24 51 75 Sci 31 29 60 Tech 57 71 128 Total 368 497 865 Out of 865-60 = 605 non-science majors, 368 – 31 = 337 have loans. This is p 2 60 science majors p1 337 .418634 . For the 805 p p 0 31 .516667 . The test ratio is z , z.01 2.327 Since this is a right60 p sided test, we reject H 0 if our calculated z-ratio is above 2.327. p .516667 .418624 .098043 For a test ratio or a critical value for p , p 0 n1 p1 n 2 p 2 368 .425434 n1 n 2 865 1 1 1 1 .425434 .574566 .425434 .574566 .017909 60 805 n n 2 1 p 0 .098043 .425434 .574566 .017909 .004378 .066164 . z 1.482 so we do not reject p .066164 p p 0 q 0 H 0 . Alternatively, p value Pz 1.48 .5 .4306 .0694 .01 -same conclusion. 22 252y0821 3/31/08 c) Use a critical value for the difference between proportions to test your hypotheses from a) (2) pcv p0 z 2 p . But we need one critical value above zero so pcv 0 z 2 p 2.327 .066163 .1540 and we reject H 0 if p is above this value. p .098043 is clearly below the critical value, so do not reject H 0 . d) Use an appropriate confidence interval to test your hypotheses from a) (2) For a confidence interval for pq p q p , s p 1 1 2 2 n n2 1 .516667 .483333 .418634 .581366 .004162 .000251 60 805 .004413 .066433 The two-sided formula p p z 2 s p . The alternative hypothesis is H 1 : p 0 , so our confidence interval is p p z s p .098043 3.327 .066433 .1230 . This interval includes zero, so H 0 : p 0 is not contradicted. e) Treat each major separately and test the hypothesis that the proportion of students that have loans is independent of major (4) See below. f) If you did section 1e, follow your analysis with a Marascuilo procedure to compare the proportion of business students that have loans with the proportions for the other 6 majors. Tell which differences are significant. (3) [14] We are testing H 0 : p1 p 2 p3 p 4 p5 p 6 p 7 . Our data is repeated below. Row 1 2 3 C1 Loan None Col Total Ag 32 35 67 Ch 37 50 87 Engg 98 137 235 Lib 89 124 213 Bus 24 51 75 Sci 31 29 60 Tech 57 71 128 Total 368 497 865 We can make this into an expanded O table by summing in each direction, computing row proportions, pq finding the proportion with no kids and computing s 2p . n O Major Loan? Ag Ch Engg Lib 32 Yes 37 98 89 No 50 137 124 35 Sum 67 87 235 213 Pr oportion .4776 .4253 .4170 .4178 pq .0037 .0028 .0010 .0011 n I will use the row proportions and the column sums to create the E Loan? Yes No Sum Pr oportion pq n Ag Ch 28 .50 37 .01 38 .50 49 .99 67 87 .4776 .4253 .0037 .0028 Bus Sci Tch Total pr 24 31 57 368 .4254 51 29 71 497 .5746 75 60 128 865 1.0000 .3200 .5167 .4453 .0029 .0041 .0019 E table. Major Engg Lib Bus Sci Tch Total pr 99 .98 90 .62 31 .91 25 .53 54 .46 368 .4254 135 .0 122 .38 43 .09 34 .47 73 .54 497 .5746 235 213 75 60 128 571 1.0000 .4170 .4178 .3200 .5167 .4453 .0010 .0011 .0029 .0041 .0019 Actually, all numbers were carried to more places than reported here. On the next page 2 is computed two ways. This is, of course, unnecessary. 23 252y0821 3/31/08 Row E OE 28.504 37.013 99.977 90.617 31.908 25.526 54.455 38.496 49.987 135.023 122.383 43.092 34.474 73.545 865 3.49595 -0.01272 -1.97688 -1.61734 -7.90751 5.47399 2.54451 -3.49595 0.01272 1.97688 1.61734 7.90751 -5.47399 -2.54451 0.000 O 1 32 2 37 3 98 4 89 5 24 6 31 7 57 8 35 9 50 10 137 11 124 12 51 13 29 14 71 Total865 O E 2 E O2 E 0.42877 35.925 0.00000 36.987 0.03909 96.062 0.02887 87.412 1.95969 18.052 1.17388 37.648 0.11890 59.663 0.31748 31.822 0.00000 50.013 0.02894 139.006 0.02137 125.639 1.45104 60.359 0.86919 24.395 0.08804 68.544 6.52526 871.526 O E 2 O2 n . Both of these two E E formulas are shown above. There is no reason to do both. DF r 1c 1 2 17 1 6 . So we have The formula for the chi-squared statistic is 2 2 O E 2 2 6 E 6.5253 or 2 or 2 O2 n 871 .52526 865 6.5253 . If we compare our results E with .01 16.8119 , we notice that our computed value is below the table value Since our computed value of chi-squared is smaller than the table value, we cannot reject our null hypothesis. Note: Marascuilo Procedure. The Marascuilo procedure says that, for 2 by c tests, if (i) equality is rejected and (ii) p a p b 2 s p , where a and b represent 2 groups, the chi - squared has c 1 degrees of freedom and the standard deviation is s p p a q a pb qb , you can say that you have a significant na nb difference between p a and p b . This is equivalent to using a confidence interval of c 1 pa qa pa pb pa pb 2 p q b b n nb a Version 0 O Major Loan? Ag Ch Engg Lib Bus Sci Tch Total pr Yes 32 37 98 89 24 31 57 368 .4254 No 50 137 124 51 29 71 497 .5746 35 Sum 67 87 235 213 75 60 128 865 1.0000 Pr oportion .4776 .4253 .4170 .4178 .3200 .5167 .4453 pq .0037 .0028 .0010 .0011 .0029 .0041 .0019 n pq Note that I should have carried more places in . But I wanted a concise table for your convenience. n 24 252y0821 3/31/08 6 The proportion of business students with loans is .3200 and we are using 2 .05 16.8119. The contrast we p q will use is thus p a p 5 p a .3200 16 .8119 a a .0029 na Ag: p1 p 5 .4776 .3200 16 .8119 .0037 .0029 .1576 .3331 Chem: p 2 p 5 .4153 .3200 16 .8119 .0028 .0029 .1053 .3096 Engineering: p 3 p 5 .4170 .3200 16 .8119 .0010 .0029 .0970 .2561 Library: p 4 p 5 .4178 .3200 16 .8119 .0011 .0029 .0978 .2224 Science: p 6 p 5 .5167 .3200 16 .8119 .0041 .0029 .1967 .2593 Tech: p 7 p 5 .4453 .3200 16 .8119 .0019 .0029 .1253 .2841 There is no surprise here. Since the chi-squared test said that there was no significant difference we expect all of our intervals to include zero. They do. g) (Extra credit) Check your results using Minitab. (i) To do a chi-squared test on an O table that is in Columns c1-c7, simply put the row labels in a Column if you want them and print out your data. Then type in ChiSquare c1 – c7. The computer will print back the columns with their names, but below each number from the O table you O E 2 , the contribution of the value of O to the chi-square E total. Use the p-value to find out if we reject the hypothesis of equal proportions at the 1% significance level. will find the corresponding values of E and Chi-Square Test: O1, O2, O3, O4, O5, O6, O7 Expected counts are printed below observed counts Chi-Square contributions are printed below expected counts O1 O2 32 37 28.50 37.01 0.429 0.000 O3 98 99.98 0.039 O4 O5 O6 O7 89 24 31 57 90.62 31.91 25.53 54.46 0.029 1.960 1.174 0.119 2 35 38.50 0.317 50 49.99 0.000 137 135.02 0.029 124 122.38 0.021 51 43.09 1.451 29 34.47 0.869 71 73.54 0.088 497 Total 67 87 235 213 75 60 128 865 1 Total 368 Chi-Sq = 6.525, DF = 6, P-Value = 0.367 (ii) To do a test of the alternative hypothesis H 1 : p1 p 2 , where p1 x1 x and p 2 2 , use the n1 n2 command below, substituting your numbers for x1 , n1 , x 2 and n 2 . MTB > PTwo x1 n1 x 2 n 2 ; SUBC> Confidence 99.0; SUBC> Alternative 1; SUBC> Pooled. x1 x , x 2 , n 2 and p 2 2 a p-value for a z-test and Fisher’s n1 n2 exact test (results should be somewhat similar to the z-test) and a 1-sided 99% confidence interval. An example of Minitab output follows. The computer will print back x1 , n1 , p1 25 252y0821 3/31/08 MTB > PTwo 805 337 60 31; SUBC> Alternative -1; SUBC> Pooled. Test and CI for Two Proportions Sample 1 2 X 337 31 N 805 60 Sample p 0.418634 0.516667 Difference = p (1) - p (2) Estimate for difference: -0.0980331 95% upper bound for difference: 0.0118693 Test for difference = 0 (vs < 0): Z = -1.48 Fisher's exact test: P-Value = 0.090 P-Value = 0.069 MTB > PTwo 805 337 60 40; SUBC> Alternative -1; SUBC> Pooled. Test and CI for Two Proportions Sample 1 2 X 337 40 N 805 60 Sample p 0.418634 0.666667 Difference = p (1) - p (2) Estimate for difference: -0.248033 95% upper bound for difference: -0.143925 Test for difference = 0 (vs < 0): Z = -3.74 Fisher's exact test: P-Value = 0.000 P-Value = 0.000 26 252y0821 3/31/08 2. (Moore, McCabe et. al) An absolutely tactless psychology professor has divided faculty members into categories the professor labels ‘Fat’ and ‘Fit’. A random sample of scores on a test of ‘ego strength’ of the ‘Fat’ faculty is labeled x1 . A sample of ‘ego strength’ of the ‘Fit’ faculty is labeled x 2 . d x1 x 2 . Use a 95% confidence level in this problem. The professor has computed Fat scores = 64.96, x 2 1 x 1 Row Sum of 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Sum of squares of Fat scores = 307.607, x x 2 Sum of scores of Fit = 90.02, 2 2 Sum of squares of Fit scores = 581.239, d Sum of diff = -25.06 and d Sum of squares of diff = 2 51.8198. Fat Fit Diff x1 x2 d x1 x 2 4.99 4.24 4.74 4.93 4.16 5.53 4.12 5.10 4.47 5.30 3.12 3.77 5.09 5.40 6.68 6.42 7.32 6.38 6.16 5.93 7.08 6.37 6.53 6.68 5.71 6.20 6.04 6.52 -1.69 -2.18 -2.58 -1.45 -2.00 -0.40 -2.96 -1.27 -2.06 -1.38 -2.59 -2.43 -0.95 -1.12 To personalize the data remove row b , where b is the last digit of your student number. Please state clearly what row you removed. At this point you will have n1 n 2 13 rows of data. You will need the mean and variance of all three columns of data if you do all sections of this problem. You can save yourself considerable effort by using the computational formula for the variance with the sums and sums of squares that the professor computed with the value or value squared of the numbers you removed subtracted. The professor got the following results. Individualized solutions are in 252y0821a. Variable Fat Fit diff n 14 14 14 Mean 4.640 6.430 -1.790 SE Mean 0.184 0.115 0.196 StDev 0.690 0.431 0.732 Median 4.835 6.400 -1.845 Your results should be relatively similar. Credit for computing the sample statistics needed is included in the relevant parts of this problem. State hypotheses and conclusions clearly in each segment of the problem. a) Assume that x1 and x 2 are independent random samples and test the hypothesis that the population mean of the ego strength of the ‘fit’ faculty is above the population mean of the ‘fat’ faculty. Assume that the data comes from the Normal distribution and that the variances for the ‘fit’ and ‘fat’ populations are similar. (3) Fat Row 1 2 3 4 5 6 7 8 9 10 11 12 13 x1 Fit x12 4.24 17.9776 4.74 22.4676 4.93 24.3049 4.16 17.3056 5.53 30.5809 4.12 16.9744 5.10 26.0100 4.47 19.9809 5.30 28.0900 3.12 9.7344 3.77 14.2129 5.09 25.9081 5.40 29.1600 59.97 282.707 x2 diff x22 d To summarize d2 6.42 41.2164 -2.18 4.7524 7.32 53.5824 -2.58 6.6564 6.38 40.7044 -1.45 2.1025 6.16 37.9456 -2.00 4.0000 5.93 35.1649 -0.40 0.1600 7.08 50.1264 -2.96 8.7616 6.37 40.5769 -1.27 1.6129 6.53 42.6409 -2.06 4.2436 6.68 44.6224 -1.38 1.9044 5.71 32.6041 -2.59 6.7081 6.20 38.4400 -2.43 5.9049 6.04 36.4816 -0.95 0.9025 6.52 42.5104 -1.12 1.2544 83.34 536.616 -23.37 48.9637 x x x x d d 1 59.97, 2 1 282.707, 2 83.34, 2 2 536.616, -23.37, 2 48.9637 and n1 n 2 n 13 . 27 252y0821 3/31/08 These will have to be used somewhere in the next problems, so let’s get it over with. Of course, you could have saved lots of time using the numbers that I gave you. x 59 .97 x1 4.61308 s12 n1 13 s1 0.71068 1 x2 x 2 83 .34 6.4108 , s 22 13 n2 s 2 0.44182 d x 2 1 n1 x12 n1 1 x 2 2 n 2 x 22 n2 1 d 23.37 1.7977 , s d 2 d 2 282 .707 134.61308 2 6.06078 0.50503 12 12 536 .616 136.4108 2 2.342490 0.19521 12 12 nxd 2 n 13 n 1 s d 0.76112 The parts of Table 3 useful in a) and b) follow. Interval for Confidence Hypotheses Interval Difference H 0 : D D0 * D d t 2 s d between Two H 1 : D D0 , 1 1 Means ( sd s p D 1 2 n1 n2 unknown, variances assumed equal) DF n1 n2 2 Difference between Two Means( unknown, variances assumed unequal) H 0 : D D0 * D d t 2 s d s12 s22 n1 n2 sd DF s12 s22 n 1 n2 H 1 : D D0 , 48 .9637 13 1.7977 2 6.95163 0.57930 12 12 Test Ratio t sˆ 2p t D 1 2 d D0 sd Critical Value d cv D0 t 2 s d n1 1s12 n2 1s22 n1 n2 2 d D0 sd d cv D0 t 2 s d 2 s12 2 n1 n1 1 s 22 2 n2 n2 1 The hypothesis that the population mean of the ego strength of the ‘fit’ faculty is above the population mean of the ‘fat’ faculty translates as H 1: 1 2 (or, if D 1 2 , H 1: D 0 ) , which means that the null hypothesis is H 0: 1 2 (or H 1: D 0 ). Solution: If we assume equal variances, we use DF n1 n 2 2 13 13 2 24 and we have n 1s12 n2 1s22 120.50503 120.19521 8.40324 0.35014 d 1.7977 and sˆ2p 1 n1 n2 1 13 13 2 24 s 1 1 1 1 0.35014 0.59173 . So s d s p2 n n 13 13 2 1 This is a left sided test, so we will be using t 24 1.711 p 20.35014 .05387 0.23209. 13 .05 d 0 1.7977 Test Ratio: t 7.745 . Since this is a left-sided test, our ‘reject’ region is all points sd 0.23209 24 24 1.711 . Since t 7.745 is below -1.711, we reject H 0 . Note that t .001 3.467 so that below t .05 the p-value is below .001. Critical value for the difference between sample means: For a two-sided test d cv 0 t 24 s d , but this is a 2 left-sided test and, because the alternative hypothesis is H 1: D 0 , we need one critical value below zero 28 252y0821 3/31/08 d cv 0 t24 s d 1.7110.23209 0.3971 Our ‘reject’ region is all points below -0.3971. Since d 1.7977 is below our critical value, we reject H 0 . One-sided confidence interval: For a two-sided test D d t 24 s d , but this is a left-sided test and, 2 because the alternative hypothesis is H 1: D 0 , we need a one-sided confidence interval or lower limit D d t24 s d 1.7977 0.3971 1.401 . The interval D 1.401 does not include zero, so it contradicts H 0: D 0 , and we reject H 0 . b) (Extra credit) Assume that x1 and x 2 are independent random samples and test the hypothesis that the population mean of the ego strength of the ‘fit’ faculty is above the population mean of the ‘fat’ faculty. Assume that the data comes from the Normal distribution and that the variances for the ‘fit’ and ‘fat’ populations are not similar. (3) Solution: Recall x1 4.61308 , s12 0.50503 , s1 0.71068 , x 2 6.4108 , s 22 0.19521, s 2 0.44182 , d 1.7977 and n1 n 2 13 . Our worksheet includes the following. s12 0.50503 0.03885 13 n1 s 22 n2 s12 s 22 n1 n 2 DF s d 0.19521 0.01502 13 s12 s 22 0.05387 0.23209 n1 n 2 0.05387 s12 s 22 n1 n 2 2 2 2 s12 s 22 n1 n2 n1 1 n2 1 Test Ratio: t 0.05387 2 0.03885 2 0.01502 2 12 0.002902 20.072 0.0001258 0.000188 12 d 0 1.7977 7.746 . Since this is a left-sided test, our ‘reject’ region is all points sd 0.23209 20 20 1.725 . Since t 7.746 is below -1.725, we reject H 0 . Note that t .001 3.552 so that below t .05 the p-value is below .001. Critical value for the difference between sample means: For a two-sided test d cv 0 t 14 s d , but this is a 2 left-sided test and, because the alternative hypothesis is H 1: D 0 , we need one critical value below zero d cv 0 t20 s d 1.725 0.23209 0.400 Our ‘reject’ region is all points below -0.400. Since d 1.7977 is below our critical value, we reject H 0 . One-sided confidence interval: For a two-sided test D d t 14 s d , but this is a left-sided test and, 2 because the alternative hypothesis is H 1: D 0 , we need a one-sided confidence interval or lower limit D d t14 s d 1.7977 0.400 1`.398 . The interval D 1.398 does not include zero, so it contradicts H 0: D 0 , and we reject H 0 . 29 252y0821 3/31/08 c) Assume that x1 and x 2 are independent random samples. How would we decide whether the method in a) of b) is correct? Do the appropriate test. Assume that the data comes from the Normal distribution. Should we have used a) or b)? (2) [22] Solution: Recall s12 0.50503 , s 22 0.19521 and n1 n 2 13 . For a test of H 0: 12 22 , we need only test the larger of s12 s 22 and s 22 s12 against the appropriate F . Because the number of degrees of freedom for 2 both variances is 12, we will test s12 s 22 0.50503 12,12 3.28 . The ‘reject’ zone is above 2.587 against F.025 0.19521 3.29, so we do not reject the null hypothesis. d) Compute the mean and variance of the column of differences and test the column to see if the Normal distribution works for these data. (4) Solution: I have already done this above, and have done one Lilliefors test, so I will use Minitab as my d d calculator this time around. The d column is put in order. We calculate ' z ' . We use the Normal sd table to find the cumulative expected frequency Fe Pz ' z ' . For example we find Pz 1.52710 Pz 1.52 .5 P1.52 z 0 .5 .4357 .0643 . The computer, which does not round the value of ' z' , gets a slightly more accurate value of 0.063368. The O column is added along to get the cum O column, which is divided by n 13 to get the observed cumulative frequency, Fo . Finally, the D column gives the absolute values of the difference between the cumulative observed and the cumulative expected frequencies. The maximum value in the D column, which is .143266, is compared against the 5% value in the Lilliefors table, which is .2337. Since the computed value does not exceed the table value, we cannot reject the null hypothesis of Normality. Row 1 2 3 4 5 6 7 8 9 10 11 12 13 d -2.18 -2.58 -1.45 -2.00 -0.40 -2.96 -1.27 -2.06 -1.38 -2.59 -2.43 -0.95 -1.12 d in order ' z ' -2.96 -2.59 -2.58 -2.43 -2.18 -2.06 -2.00 -1.45 -1.38 -1.27 -1.12 -0.95 -0.40 d d Fe Pz ' z ' O cum O Fo sd -1.52710 -1.04098 -1.02784 -0.83076 -0.50230 -0.34463 -0.26580 0.45682 0.54879 0.69331 0.89039 1.11374 1.83636 0.063368 0.148943 0.152013 0.203055 0.307729 0.365185 0.395196 0.676099 0.708424 0.755943 0.813372 0.867306 0.966848 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 4 5 6 7 8 9 10 11 12 13 0.07692 0.15385 0.23077 0.30769 0.38462 0.46154 0.53846 0.61538 0.69231 0.76923 0.84615 0.92308 1.00000 D Fo Fe 0.013555 0.004903 0.078756 0.104638 0.076886 0.096354 0.143266 0.060714 0.016116 0.013288 0.032782 0.055771 0.033152 e) Assume that we had rejected the hypothesis that the distributions in the populations that the columns come from is Normal, do a one-sided test to see whether the ego strength of the ‘Fat and ‘Fit’ people differs. (2) Solution: Because the data are assumed to be independent nonnormal random samples and thus not paired, H 0 : 1 2 use the Wilcoxon-Mann-Whitney Rank Sum Test. or the null hypothesis is simply a one H 1 : 1 2 sided version of ‘similar distributions.' n1 n 2 13 . In the table below, r1 and r2 represent bottom to top ranking. 30 252y0821 3/31/08 Row x1 r1 x2 r2 1 2 3 4 5 6 7 8 9 10 11 12 13 4.24 4.74 4.93 4.16 5.53 4.12 5.10 4.47 5.30 3.12 3.77 5.09 5.40 5 7 8 4 13 3 10 6 11 1 2 9 12 91 6.42 7.32 6.38 6.16 5.93 7.08 6.37 6.53 6.68 5.71 6.20 6.04 6.52 21 26 20 17 15 25 19 23 24 14 18 16 22 260 Recall that n1 13, and n 2 13, and that the total number of numbers that we have ranked is nn 1 26 27 351 and that, to verify 2 2 our ranking, we find that the two rank sums add to 91 + 260 = 351. The outline says that the smaller of SR1 and SR2 is called W and is compared with Table 5 or 6. W 91 . Neither table has critical values for problems this large. For values of n1 and n 2 that are too large for the tables, W has the normal distribution with mean W 1 2 n1 n1 n 2 1 1 2 1327 175 .5 and variance n n1 n 2 26. Note that the sum of the first 22 numbers is W2 16 n2 W 1 6 13 175 .5 380 .25 . Note that the outline says n1 n 2 , but this does not create a problem here.. If the significance level is 5% and the test is one-sided, we reject our null hypothesis if W W W W 91 175 .5 84 .5 z 4.222 . Since lies below z .05 1.645 . In this case z W W 19 .5 380 .25 this is below -1.645, we reject H 0 . To get a p-value for this result, use Pz 4.22 .5 .5000 0 . f) In the remainder of this problem assume that the x1 and x 2 columns are not independent random samples but instead represent the ego strength of the same 14 or 13 faculty members before and after a fitness program. Assuming that the Normal distribution applies, can we say that the ego strength of the faculty has increased? (2) Solution: Table 3 has the following. Interval for Confidence Hypotheses Test Ratio Critical Value Interval Difference H 0 : D D0 * D d t 2 s d d cv D0 t 2 s d d D0 t between Two H 1 : D D0 , sd d x1 x 2 Means (paired D 1 2 data.) s df n 1 where sd d n1 n 2 n n The hypothesis that the population mean of the ego strength of the ‘fit’ faculty is above the population mean of the ‘fat’ faculty translates as H 1: 1 2 (or, if D 1 2 , H 1: D 0 ) , which means that the null hypothesis is H 0: 1 2 (or H 1: D 0 ). Recall the following. x1 4.61308 , x 2 6.4108 , d 1.7977 , s d2 0.57930 , s d 0.76112 and n 13 . This means that s d 12 1.782 be using t .05 sd n 0.57930 0.04456 0.21110 . This is a left sided test, so we will 13 31 252y0821 3/31/08 Test Ratio: t d 0 1.7977 8.516 . Since this is a left-sided test, our ‘reject’ region is all points sd 0.21110 12 12 below t .05 1.782 . Since t 8.516 is below -1,782, we reject H 0 . Note that t .001 3.930 so that the p-value is below .001. Critical value for the difference between sample means: For a two-sided test d cv 0 t 12 s d , but this is a 2 left-sided test and, because the alternative hypothesis is H 1: D 0 , we need one critical value below zero d cv 0 t12 s d 1.782 0.21110 0.37618 Our ‘reject’ region is all points below -0.3971. Since d 1.7977 is below our critical value, we reject H 0 . One-sided confidence interval: For a two-sided test D d t 24 s d , but this is a left-sided test and, 2 because the alternative hypothesis is H 1: D 0 , we need a one-sided confidence interval or lower limit D d t14 s d 1.7977 0.37618 1.422 . The interval D 1.422 does not include zero, so it contradicts H 0: D 0 , and we reject H 0 . g) Repeat f) under the assumption that the Normal distribution does not apply. (1) Solution: This is a Wilcoxon signed rank test. .05 . H 0: 1 2 and H 1: 1 2 . The data are below: The column d is the absolute value of d , the column r ranks the absolute values, and the column r * is the ranks corrected for ties and marked with the signs on the differences. Row 1 2 3 4 5 6 7 8 9 10 11 12 13 x1 x2 d x 2 x1 d r 4.24 4.74 4.93 4.16 5.53 4.12 5.10 4.47 5.30 3.12 3.77 5.09 5.40 6.42 7.32 6.38 6.16 5.93 7.08 6.37 6.53 6.68 5.71 6.20 6.04 6.52 -2.18 -2.58 -1.45 -2.00 -0.40 -2.96 -1.27 -2.06 -1.38 -2.59 -2.43 -0.95 -1.12 2.18 2.58 1.45 2.00 0.40 2.96 1.27 2.06 1.38 2.59 2.43 0.95 1.12 9 11 6 7 1 13 4 8 5 12 10 2 3 r* 91167113485121023- If we add together the numbers in r * with a + sign we get . T 0 . If we do the same for numbers with a – sign, we get T 91 . To check this, note that these two numbers must sum to the sum of the first n numbers, nn 1 1314 91 , and and that this is 2 2 that T T 0 91 91 . We check 0, the smaller of the two rank sums against the numbers in table 7. {wsignedr} For a one-sided 5% test, we use the .05 column. For n 13 , the critical value is 21, and we reject the null hypothesis only if our test statistic is below this critical value. Since our test statistic is 0, we reject the null hypothesis. 32 252y0821 3/31/08 h) Use the Wilcoxon signed rank test, to test to see if the median of the d column is -2. (2) [35] Solution: This is a Wilcoxon signed rank test. .05 . H 0: d 2 and H 1: d 1 . The data are below: We have replaced the x1 column with our old d column and replaced the x 2 column with a column of 2’s. We compute a new d ' column by subtracting the 2’s from the original d column. The column d ' is the absolute value of d ' , the column r ranks the absolute values, and the column r * is the ranks corrected for ties and marked with the signs on the differences. Because there is a zero in the ranking of d ' , we have lowered all the ranks by 1 and left out the zero. Row 1 2 3 4 5 6 7 8 9 10 11 12 13 d -2.18 -2.58 -1.45 -2.00 -0.40 -2.96 -1.27 -2.06 -1.38 -2.59 -2.43 -0.95 -1.12 2' s d ' d 2' s d ' -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -0.18 -0.58 0.55 0.00 1.60 -0.96 0.73 -0.06 0.62 -0.59 -0.43 1.05 0.88 0.18 0.58 0.55 0.00 1.60 0.96 0.73 0.06 0.62 0.59 0.43 1.05 0.88 r 3 6 5 1 13 11 9 2 8 7 4 12 10 r* 254+ 12+ 108+ 17+ 6311+ 9+ If we add together the numbers in r * with a + sign we get . T 51 . If we do the same for numbers with a – sign, we get T 27 . To check this, note that these two numbers must sum to the sum of the first n numbers, and nn 1 12 13 78 , that this is 2 2 and that T T 51 27 78 . We check 27, the smaller of the two rank sums against the numbers in table 7. {wsignedr} For a two-sided 5% test, we use the .025 column. For n 12 , the critical value is 14, and we can only reject the null hypothesis only if our test statistic is below this critical value. Since our test statistic is 27, we cannot reject the null hypothesis. Most of you seemed to be looking for magic here. Almost everyone did the test appropriate to part g. Common sense says that if you want to test for -2, you have to use -2 somewhere in the problem. i) Extra credit. Use Minitab to check your work. All versions of this problem appear in the appendix below in excruciating detail. 33