12/1/98 252z9861 2. The data below consists of 1000 monthly salaries. a. Use a chi-square test to see if they have a normal distribution with a mean of 1200 and a standard deviation of 200. Note: You found the probabilities for the first three intervals on page 1. You can find the others by symmetry. (6) Would a Lilliefors or a Kolmogorov-Smirnov test be a correct more powerful method to test this hypothesis? Use the one of these that is appropriate to do the test in a). (6) b. Interval Less than $800 $800-$1000 $1000-$1200 $1200-$1400 $1400-$1600 Probability .0228 .1359 .3413 .3413 .1359 Expected 22.8 135.9 341.3 341.3 135.9 Observed 26 146 361 311 143 $1600 or above .0228 22.8 Sum 1.0000 1000.0 Solution: a)The 2 test is done two ways. 13 1000 E O 2 O2 E E 26 22.8 -3.2 10.240 0.44912 29.649 146 135.9 -10.1 102.010 0.75063 156.851 361 341.3 -19.7 388.090 1.13709 381.837 311 341.3 30.3 918.089 2.68998 283.390 143 135.9 -7.1 50.410 0.37094 150.471 13 22.8 9.8 96.040 4.21228 7.412 1000 1000.0 0.0 9.61005 1009.610 From the last column 2 1009.611000 9.61 , which is the same as the result in the fifth column. H 0 is N 1200 ,200 and, because there are 6 rows and no parameters are estimated from the data, there are 5 O E O E E O2 5 degrees of freedom. Since 2 .05 11 .0705 is larger than our computed 2 , we accept H 0 . b) You must use a Kolmogorov-Smirnov test due to the fact that the mean and variance are known. Lilliefors is only appropriate if they are unknown and must be estimated from the data. The table for the test appears below. The probabilities and O are repeated in the first and third columns. A cumulative distribution is computed for E in the second column and a cumulative distribution for O is computed in the fifth column. Finally D is computed in the last column. Probability .0228 .1359 .3413 .3413 .1359 .0228 1.0000 Fe .0228 .1587 .5000 .8413 .9772 1.0000 O 26 146 361 311 143 13 1000 O Fo n .0260 .1460 .3610 .3110 .1430 .0130 1.0000 .0260 .1720 .5330 .8440 .9870 1.0000 D .0032 .0133 .0330 .0027 .0098 .0000 H 0 is again N 1200 ,200 . The maximum difference is .0330 and we compare it to the critical value from the K-S table.. For a significance level of 5% this is 1.36 n is smaller than the critical value we accept H 0 . 5 1.36 1000 .0430 . Since the maximum difference 12/1/98 252z9861 3. The data below represents samples of consumer ratings of three different displays. a. Assuming that each column represents a random sample from a normal distribution and that variances of the parent populations are similar, compare the means of the three populations. (9) Column sums are now given. x1 225, x12 10475, x2 175, x22 6375, x3 180, x32 6850 . b. Do a confidence interval for the difference between mean ratings of the first and third displays, assuming that the interval is one of three possible contrasts. (3) c. Assume that each row represents the opinions of a single individual and find if there are significant differences between display means and means for individuals.(8) Solution: New material is added in boldface. Display Display Display 1 2 3 50 45 45 45 30 35 30 25 20 45 35 40 55 40 40 Sum 225 175 180 5 5 5 nj 45 x j SS x j a) 35 10475 2025 2 36 6375 1225 6850 1296 One-way ANOVA SST 2 j .j DF 2 12 14 MS 151.647 80.833 F 1.876 difference between display means.’ b) From the outline, the Scheffe interval is m 1Fm 1, n m s 45 36 24.75 80 ,8333 6 6550 4150 1925 4850 6225 23700 2 xijk 2177.7778 1344.4444 625.0000 1600.0000 2025.0000 7772.2222 x 23700 4546 2 xijk x i 2 x .2j . x 2 .j nx 2 x 2 i. nx 2 3772 .2222 1538 .6667 2 2,12 4.75 Since F.05 we accept H 0 which is ‘no 9 15 .95 46.6667 36.6667 25.0000 40.0000 45.0000 (38.6667) x (38.6667) SSR C 303 .2947 SSW SST SSB 1 2 x1 x3 3 3 3 3 3 15 n 303 .2947 nx 2 SS 303.2947 970 1293.2947 xi 2 54546 15 38 .6667 2 54546 15 38 .6667 2 Source Between Within Total 140 110 75 120 135 580 15 SS x i SSC R 1273 .2947 n x ni c) Two-way ANOVA SST Same as 1-way ANOVA x 2 nx 2 23700 15 38 .6667 2 SSB Sum 1 1 6 6 1 1 n1 n2 889 .9613 SSW SST SSR SSC Source Rows Columns Within Total SS 888.9613 303.2947 80.0387 1293.2947 DF 4 2 8 14 MS 222.4903 151.6474 10.0048 F 22.24 15.16 4,8 3.84 Since F.05 we reject H 01 which is ‘no difference between display means.’ Since 2,8 4.46 we reject F.05 H 02 which is ‘no difference between display means.’ 6 12/1/98 252z9861 4. Data is repeated from the previous problem. a. Drop the assumption of normality and compare the columns assuming that they represent random samples. (5) c. Do the same assuming that each row represents the opinion of one consumer. (5) Display 1 50 45 30 45 55 Display 2 45 30 25 35 40 Display 3 45 35 20 40 40 r1 r2 r3 3 1.5 1.5 3 1 2 3 2 1 3 1 2 3 1.5 1.5 15 7.0 8.0 Solution: a) The data is arranged in order at left and ranked at right below. As usual ties are treated by giving the same rank to equal numbers. For example, the 45s are initially ranked as numbers 10, 11, 12, and 13, but these are replaced by their average, which is 11.5. x1 x2 x3 r1 r2 r3 20 1 25 2 30 30 3.5 3.5 35 35 5.5 5.5 40 40 8 8 40 8 45 45 45 11.5 11.5 11.5 45 11.5 50 14 55 15 ___ ___ 55.5 30.5 34.0 15 16 , the sum of the first 15 numbers. Our check on the ranking is that 55.5 + 30.5 + 34 = 120, which is 2 2 SRi 12 12 55 .52 30 .52 34 .02 We now calculate H 3n 1 3 16 nn 1 ni 15 16 5 5 5 = 0.05(1033.3) - 48 = 3.665. We then use the Kruskal -Wallis table for 5, 5, 5 to find that the p-value for 3.665 is above .102. Since this is above 5%, we do not reject H 0 which is ‘no difference between distribution (or medians) of x1 , x 2 and x3 .’ b) Ranks of the numbers within rows are given in boldface to the right of the original table. We take the column rank sums and check our ranking by noting that the sum of the rank sums should be rcc 1 5 3 4 30 , which is the sum of 15, 7 and 8. We now calculate 2 2 12 12 1 F2 SR 2 3r c 1 15 2 7 2 8 2 338 60 7.6 . From the Friedman table rcc 1 5 3 4 5 for N = 5 and k = 3, the p-value for 7.6 is .024. Since this is less than 5%, we reject H 0 which is ‘no difference between distribution (or medians) of x1 , x 2 and x3 .’ 7 12/1/98 252z9861 5. Union member confidence in big business and job satisfaction is reported below. a. Test the hypothesis that job satisfaction and confidence are independent. (7) b. Test the hypothesis that the proportion that is very confident is the same for the very satisfied and the moderately satisfied. (4) O Very Confident Somewhat confident Not Confident Sum Very Satisfied 26 95 34 155 Moderately Satisfied 15 73 28 116 Dissatisfied Sum 3 21 19 43 44 189 81 314 E pr .1401 .6019 .2580 1.0000 V.S. M.S. D. Sum 21.72 93.29 39.99 155.00 16.25 69.82 29.93 116.00 6.02 25.88 11.09 42.99 43.99 188.99 81.01 313.99 Solution: a) The construction of the Expected table is shown in boldface above. pr is the proportion in each row. For example .1401 is 44 divided by 314. To get the upper left hand value in E , multiply 155 by .1401. The 2 test is done two ways below. The value of 2 is either 10.2145 or 325.2245 – 314 = 4 10.2245. Since the problem has r 1c 1 2 2 4 degrees of freedom and 2 .05 9.4877 is smaller than either value, we reject H 0 which is ‘independence.’ O E O 2 E O2 E O E O2 E 31.1234 96.7413 28.9072 13.8462 76.3248 26.1945 1.4950 17.0402 32.5518 324.2245 E 26 21.72 -4.28 18.3184 0.84339 95 93.29 -1.71 2.9241 0.03134 34 39.99 5.99 35.8801 0.89723 15 16.25 1.25 1.5625 0.09615 73 69.82 -3.18 10.1124 0.14484 28 29.93 1.93 3.7249 0.12445 3 6.02 3.02 9.1204 1.51502 21 25.88 4.88 23.8144 0.92019 19 11.09 -7.91 62.5681 5.64185 314 313.99 0.00 10.2145 b) From page 10 of the Syllabus supplement. Interval for Confidence Hypotheses Test Ratio Critical Value Interval pcv p0 z p Difference p p z 2 sp H 0 :p p0 p p0 z between If p0 0 p p1 p2 p H 1 : p p0 proportions pq 1 1 q 1 p sp p1q1 p2 q 2 n1 n2 p0 p01 p02 or p 0 0 p If p 0 p01q 01 p02 q 02 n1 n2 p Or use H 0 : p p 0 p1 p0 H1 : p p 0 or H 0 : p1 p 2 x1 26 .1677 n1 155 x2 15 .1293 n1 116 n1 p1 n2 p2 26 15 41 .1775 n1 n2 155 116 231 If we use a test ratio, z reject H 0 . 8 p2 p p0 p p 0 H1 : p1 p 2 0 0 2 n1 n p n2 p2 p0 1 1 n1 n2 s p .10 z 2 1.645 p p1 p2 .1677 .1293 .0384 1 1 p 0 q 0 n1 n 2 .1775 .8225 1 155 1 .0022 .04691 116 .0384 0.819 ,it is inside the interval z.025 1.96 so do not .04691 n2