5/5/00 252z0043 2. A sales manager wishes to predict unit sales by sales persons ( y ) on the basis of number of calls made ( x1 ) and number of different products the salesperson pushes ( x 2 ). The data is below (Use .01) . Row units y 28 71 38 70 22 27 28 47 14 70 1 2 3 4 5 6 7 8 9 10 calls products x1 14 35 22 29 6 15 17 20 12 30 x2 2 5 4 5 2 3 3 5 1 5 x y 1 392 2485 836 2030 132 405 476 940 168 2100 9964 x1 y . There is no way in the universe to get Many people were too lazy or ignorant to compute x y from x1 and y nor to get x x 1 2 1 from x 1 x and 2 . You will always be asked to compute a sum of this sort on an exam, so figure out how to do it in advance. The quantities below were given: y 415 , y 2 21471 , x 200, x12 4740, x 35, x 22 143, x y ?, x 1 y 1721 , x x 1 2 1 2 2 806 and n 10 . You do not need all of these. a. Compute a simple regression of units against calls.(8) b. Compute R 2 (4) c. Compute s e (3) d. Compute s b1 ( the std deviation of the slope) and do a confidence interval for 1 .(3) e. Do a prediction interval for units when the salesperson makes 5 calls. (3) Why is this interval likely to be larger than other prediction intervals we might compute for numbers of calls that we have actually observed? (1) Solution: x1 y 28 14 7135 70 30 9964 See computation above. a) Spare Parts Computation: x1 x y 1 n 200 20 .000 10 y 415 41.500 n 10 SSx1 Sx1 y SSx1 x y nx y 1664 .000 2.24865 x nx 740 .000 1 1 2 1 1 2 2 1 nx12 4740 10 20 .000 2 740 .000 Sx1 y x y nx y 9964 1020.000 41.500 1 1 1664 .000 SSy b1 x y 2 ny 21471 10 41 .5000 2 2 4248 .500 TSS b0 y b1 x1 41.500 2.24865 20.000 3.4729 Yˆ b0 b1 x1 becomes Yˆ 3.4727 2.24865 x1 . Lots of people found b2 instead. 1 5/5/00 252z0043 They hadn't read the question! x1 y nx1 y Sx1 y 2 1664 .000 2 0.8807 SSx1 SSy x12 nx12 y 2 ny 2 740 .000 4248 .500 2 b) R 2 ( 0 R 2 1 always!) SSR b1 Sx1 y b1 x y nx y 2.24865 1664 .000 3741 .75 1 1 R2 SSR 3741 .75 0.8807 could be SST 4248 .50 used in b) or SSR R 2 SST .88074248.5 3741.65 could be used in c). c) SSE SST SSR 4248 .5 3741 .75 506 .75 s e 63 .3436 7.95887 d) s b21 s e2 SSx1 nx12 8 tn2 t.005 3.355 SSE 506 .75 63 .3436 n2 8 ( s e2 is always positive!) s e2 x12 s e2 63 .3436 0.08560 740 .000 sb1 0.08560 0.29257 so 1 b1 tsb1 2.24865 3.3550.29257 2.25 0.98 . Note: Some 2 1 b0 b x 2 versions of this exam asked for 0 b0 tsb0 , s b20 s e2 , or t1 1 . You have to n SSx t 0 s s b1 1 b0 read the question to find out which one is wanted. Many people didn't. e) . If Yˆ 3.4727 2.24865 x1 and x10 5 , then Yˆ0 3.4727 2.248655 7.770 From the regression formula outline the Prediction Interval is Y0 Yˆ0 t sY , where 1 sY2 s e2 n X 0 X 2 X 2 nX 2 1 1 . So sY2 s e2 n x1 x1 2 2 1 x1 x1 2 x12 nx12 1 se n SSx1 1 0 0 1 5 20 2 63 .3436 1 63 .3436 1.4041 88 .9400 s y0 88.94 9.4308 . 10 740 ˆ So Y0 Y0 t sY 7.770 3.355 9.4308 7.77 31.64 . This interval will be smallest when x1 x 1 20 . Because 5 is below any values of x1 that we actually have, the prediction interval will be relatively gigantic as the x 0 x 2 involves values of x 0 that are far from the mean. 12 5/5/00 252z0043 3. Data from problem 2 is repeated. (Use .01) . Row units y 28 71 38 70 22 27 28 47 14 70 1 2 3 4 5 6 7 8 9 10 calls x1 14 35 22 29 6 15 17 20 12 30 products x2 2 5 4 5 2 3 3 5 1 5 y 415 , y 21471 , x1 200, x12 4740, x 2 35, x22 143, x y ?, x y 1721, x x 806 and n 10 . 2 1 2 1 2 a. Do a multiple regression of units against calls and products. (12) b. Compute R 2 and R 2 adjusted for degrees of freedom for both this and the previous problem. Compare the values of R 2 adjusted between this and the previous problem. Use an F test to compare R 2 here with the R 2 from the previous problem.(6) c. Compute the regression sum of squares and use it in an F test to test the usefulness of this regression. (5) d. Use your regression to predict the number of units sold when a sales person makes 20 calls and pushes 5 products.(2) e. Use the directions in the outline to make this estimate into a confidence interval and a prediction interval. (4) 35 3.500 . Second, we compute or Solution: a) First, we compute Y 41 .500 , X 1 20 .000 and X 2 10 copy X X Y 9964 , X 1 1X 2 2Y 1721 , Y 2 21471 , X 2 1 4740 , X 2 2 143 and 806 . Third, we compute or copy our spare parts: Y nY 4248 .500 * Sx y X Y nX Y 9964 1020.000 41.500 1664 .00 Sx y X Y nX Y 1721 103.500 41.500 268 .5 SSx1 X 12 nX 12 4740 1020.0002 740.00 * SSx2 X 22 nX 22 143 103.52 20.500* and Sx x X X nX X 806 1020.000 3.500 106 .000 . SSy 2 2 1 1 2 2 1 2 1 2 1 2 1 2 * indicates quantities that must be positive. (Note that some of these were computed for the last problem.) 13 5/5/00 252z0043 Fourth, we substitute these numbers into the Simplified Normal Equations: X 1Y nX 1Y b1 X 12 nX 12 b2 X 1 X 2 nX 1 X 2 X Y nX Y b X X 2 which are 2 1 1 2 nX X b X 1 2 2 2 2 nX , 2 2 1664 .00 740 .00b1 106 .00 b2 268 .500 106 .00 b1 20 .500 b2 and solve them as two equations in two unknowns for b1 and b2 . We do this by multiplying the second equation by 6.9811, which is 740.00 divided by 106.00. The purpose of this is to make the coefficients of b1 equal in both equations. We could do just as well by multiplying the second equation by 20.5 divided by 106 and making the coefficients b2 equal. 1664 .00 740 .00b1 106 .00 b2 So the two equations become . We then subtract the second equation 1874 .43 740 .00 b1 143 .11b2 210 .43 5.6704 . The first of the two normal from the first to get 210 .43 37 .11b2 , so that b 2 37 .11 equations can now be rearranged to get 740 b1 1874 .43 106 .005.6704 , which gives us b1 1.4364 . Finally we get b0 by solving b0 Y b1 X 1 b2 X 2 41.500 1.4364 20 .00 5.6704 3.500 7.0954 . Thus our equation is Yˆ b0 b1 X 1 b2 X 2 7.0954 1.4364X 1 5.6704X 2 . Note: An alternate way of solving the Simplified normal equations is to multiply the second equation by 1664 .00 740 .00b1 106 .00 b2 5.1707 which is 106 divided by 20.5. The resulting equations are We then 1388 .34 548 .10 b1 106 .00 b2 275 .66 1.436 . If we subtract the second equation from the first to get 275 .66 191 .90b1 , so that b1 191 .90 then solve for b 2 , we get essentially the same answer. b) The Regression sum of Squares is SSR b1 X 1Y nX 1Y b2 X 2 Y nX 2 Y b1 Sx1 y b2 Sx2 y 1.4364 1664 .00 5.6704 268 .500 3912 .672 * and is used in the ANOVA below. The coefficient of determination is R 2 SSR b1 Sx1 y b2 Sx 2 y SST SSy 1.4364 1664 .00 5.6704 268 .500 3912 .672 4246 .50 R2 * .8807 .9210 4246 .50 n 10 10 .9210 * . Our results can be summarized below as: k 1 2 R 2 , which is R 2 adjusted for degrees of freedom, has the formula R 2 R2 .8807 .8984 n 1R 2 k , where n k 1 k is the number of independent variables. R 2 adjusted for degrees of freedom seems to show that our second regression is better. 14 5/5/00 252z0043 One way to do the F test is to note that the total sum of squares is SSy Y 2 nY 2 4246 .500 . For the regression with one independent variable the regression sum of squares is SSR b1 Sx1 y b1 x1 y nx1 y 2.24865 1664 .000 3741 .75 *. For the regression with two independent variables the regression sum of squares is was computed in b) as 3912.672.. The difference between these is 170.922. The remaining unexplained variation is SSE SST SSR = 4248.500 – 3912.672 = 335.828*. The ANOVA table is Source X1 X2 SS* 3741.75 170.922 DF* 1 1 MS* 3741.75 170.922 F* 3.563 F.01 F71 12.25 335.828 7 47.9755 Error 4248.500 9 Total Since our computed F is smaller than the table F , we do not reject our null hypothesis that X 2 has no effect. A faster way to do this is to use the R 2 s directly. The difference between R 2 = 88.07% and R 2 = 92.10% is 4.03%. Source SS* DF* MS* F* F.01 88.07 1 88.07 X1 X1 4.03 1 4.03 3.57 F71 12.25 7.90 7 1.12857 Error 100.00 9 Total The numbers are a bit different because of rounding, but the conclusion is the same. c) We computed the regression sum of squares in the previous section. Source SS DF MS 3912.672 2 1956.33 X1 , X 2 F 41.02 F.01 2,7 9.55 F.01 335.828 7 47.975 Error 4248.500 9 Total Since our computed F is larger than the table F , we reject our null hypothesis that X 1 and X 2 do not explain Y . d) Yˆ b0 b1 X 1 b2 X 2 7.0954 1.4364X 1 5.6704X 2 . Since the last few digits don't seem to mean a lot I used Yˆ 7.10 1.4420 5.675 50.05 . 15 5/5/00 252z0043 From the ANOVA above SSE 335 .828 . s e2 Y 2 nY 2 b1 X Y nX Y b X Y nX Y Y 1 1 2 2 2 2 nY 2 1 R 2 n3 n3 SSE 335 .828 47 .975 * . This can be read from the MS in the ANOVA above. n k 1 7 s e 47 .975 6.926 . According to the outline " An approximate confidence interval Y0 Yˆ0 t prediction interval Y0 Yˆ0 t s e ." se se and an approximate n 7 Use tnk 1 t.005 3.499. So the Confidence Interval is 2 50 .05 3.499 50 .05 7.66 and the Prediction Interval is Y0 Yˆ0 t s e n 10 50.05 3.499 6.926 50.05 24.2. Y0 Yˆ0 t 6.926 16 5/5/00 252z0043 4. Your country's tourist office reports the following tourist arrivals over a 20 year period. year arrivals (thousands) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 11.75 78.93 203.04 268.95 380.49 457.32 525.51 596.56 640.74 710.67 748.02 795.13 845.21 843.08 922.58 945.22 934.72 945.67 952.38 933.86 Do the following: a. Using only the R 2 s given above: .01 (i) Show that R 2 adjusted for degrees of freedom rises between the first and second regression. (2) (ii) Fake an F test to show that the addition of the year squared improves the regression. (4) (iii) Test the correlation between arrivals and year for significance (3) (iv) Test the hypothesis that the correlation between arrivals and year is .99 (4) b. Compute a rank correlation between arrivals d 0 you are and year (Note: if you can't get Your assistant fits the following equations to the data: arrivals = 162 + 50.0 year (39.5) (3.55) R-sq =91.7% Durbin-Watson statistic = 0.19 wasting both our time) (3) and (i) Test it for significance (2) (ii) Explain why it is higher than the correlation you computed in part a above. (1) c. Explain what the values of the Durbin-Watson statistics show. (4) arrivals= -3.34 + 105 year - 2.90 yearsq (8.94) (2.18) (0.111) R-sq = 99.8% Durbin-Watson statistic = 2.48 Solution: a) (i) R 2 , which is R 2 adjusted for degrees of freedom, has the formula R 2 n 1R 2 k , n k 1 where k is the number of independent variables. For the first one, n 20 and k 1 so 19 0.917 1 0.912 , and for the second one, n 20 and k 2 so R 2 19 0.998 2 0.998 . R2 18 17 (ii) The difference between R 2 = 91.7% and R 2 = 99.8% is 8.1%. Source SS DF MS 91.7 1 X1 X2 8.1 1 8.1 F 68.85 F.01 1,17 8.40 F.01 0.2 17 0.11764 Error 100.0 19 Total Since our computed F is larger than the table F , we reject our null hypothesis that X 2 has no effect. 17 5/5/00 252z0043 XY nXY X nX Y (iii) The simple sample correlation coefficient is r 2 2 2 nY 2 , square root of XY nXY X nX Y nY .917 . Since this was given by the printout, we don't need to compute 2 R 2 2 2 2 2 it, so r .917 .9576 . From the outline, if we want to test H 0 : xy 0 against H1 : xy 0 and x and y are normally distributed, we use t n 2 r 1 r n2 2 9576 1 .9576 20 2 2 18 2.878 , 14 .1018 . Since t .005 we reject H 0 . (ii) The outline says, if 0, and we want to test H 0 : xy 0 against H1 : xy 0 “ we need to use 1 1 0 1 1 r Fisher's z-transformation. Let ~ z ln . This has an approximate mean of z ln 2 1 0 2 1 r ~ n 2 z z 1 standard deviation of s z , so that t . “ So if r .9576, n 20 and sz n3 1 1 .9576 1 1 .99 0 .99, ~z ln 2.64665 , z ln 1.91616 , s z and a 1 0.242536 and 20 3 2 1 .9576 2 1 .99 ~ n 2 z z 2.64665 1.91616 18 t 3.011 Since t .005 2.878 and this is a 2-sided test, we reject H 0 . sz 0.242536 b) (i) The data is repeated below with the calculations for rank correlation. year arrivals x1 x2 r1 r2 d r1 r2 d 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 rs 1 11.75 78.93 203.04 268.95 380.49 457.32 525.51 596.56 640.74 710.67 748.02 795.13 845.21 843.08 922.58 945.22 934.72 945.67 952.38 933.86 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10 11 12 14 13 15 18 17 19 20 16 0 0 0 0 0 0 0 0 0 0 0 0 1 -1 0 -2 0 -1 -1 4 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 4 0 1 1 16 24 d 1 624 0.9820 . If we want a 2-sided test at the 99% confidence level of nn 1 20 20 1 2 6 2 2 H 0 : 0 , compare rs with the 0.5% value from the Pearson’s rank correlation coefficient table. Since the table value is .4451 reject the null hypothesis. We conclude that the rank correlation is significant. 18 5/5/00 252z0043 (ii) The second regression shows that there is a slight curvature in the relation between the two variables. Since correlation tests for a linear relationship, it is not quite appropriate, but rank correlation will detect a slightly curved but generally positive relationship. c) A Durbin-Watson Test is a test for autocorrelation. For .01 , k 1 and n 20 , the text table gives d L .95 and d U 1.15 . The null hypothesis is ‘No Autocorrelation’ and our rejection region is d d L, 2 .95 or d 4 d L, 2 3.05 . We really should use the .005 value for d L , but a check of the .05 table leaves us sure that it is somewhat below .95. thus the D-W statistic of 0.18 is probably in the rejection region. For .01 , k 2 and n 20 , the text table gives d L .86 and d U 1.27 The 'do not reject' region is between dU , 1.27 and 4 dU , 2.73 . 2.48 is in this region, but this is really for 2 2 .02 . We can't be sure if we actually use .01 . 19 5/5/00 252z0043 5. An analysis of a sample of 200 prisoners of their adjustment to civil life after release from prison reveals the following: .01 Residence Adjustment to Civil Life. after release Outstanding Good Fair Poor Total Hometown 27 34 34 25 120 Not Hometown 15 16 24 25 80 Total 42 50 58 50 200 Do statistical tests of the following: a. The proportion in each adjustment category was the same for both 'hometown' and 'not hometown' groups. (8) b. The proportion in the combined 'outstanding' and 'good' categories was higher in the 'hometown' group than the 'not hometown' group. (5) c. The combined proportion of the whole group of 200 that made an 'outstanding' or 'good' adjustment was 50% (4) Solution: Note!! A test of multiple proportions is a 2 test! Every year I see people trying to compare more than two proportions by a method appropriate for b) below. It doesn't work! p is defined as a difference between two proportions, when you have more than two that definition doesn't work. Also, simply computing the proportions and telling me that they are different is just a way of making me suspect that you don't know what a statistical test is. a) The data is copied below. The p r s are found by dividing the row sums in O by grand total. The p r s are then used to multiply the column totals to get the material in E. O O G F P Total p r E O G F P Total p r H 27 34 34 25 120 .60 H 25 .2 30 .0 34 .8 30 .0 120 .60 80 .40 80 .40 NH 15 16 24 25 NH 16 .8 20 .0 23 .2 20 .0 Total 42 50 58 50 200 Total 42 .0 50 .0 58 .0 50 .0 200 This is a chi-squared test of homogeneity. Our null hypothesis is 'Homogeneity' . The calculations are done in two ways below. Save time by computing only Row O 1 27 2 34 3 34 4 25 5 15 6 16 7 24 8 25 Total 200 OE E 25.2 30.0 34.8 30.0 16.8 20.0 23.2 20.0 200.0 -1.80000 -4.00000 0.80000 5.00000 1.80000 4.00000 -0.80000 -5.00000 O E 2 3.2400 16.0000 0.6400 25.0000 3.2400 16.0000 0.6400 25.0000 O2 . E O E 2 E 0.12857 0.53333 0.01839 0.83333 0.19286 0.80000 0.02759 1.25000 3.78406 O E 2 O2 E 28.9286 38.5333 33.2184 20.8333 13.3929 12.8000 24.8276 31.2500 203.7841 O2 n 203 .7841 200 3.7841 E E .2013 11 .3449 so do not reject the null hypothesis. We conclude that, except for random variations, the df r 1c 1 13 3 2 proportion in each category is the same for both groups. 20 5/5/00 252z0043 b) From Table 3 Interval for Difference Between Proportions q 1 p Confidence Interval Hypotheses Test Ratio p p z 2 sp H 0 : p p 0 p p1 p 2 H 1 : p p 0 z sp p 0 p 01 p 02 p1q1 p2 q 2 n1 n2 or p 0 0 p p 0 p If p 0 p p01q 01 p02 q 02 n1 n2 Or use Critical Value pcv p0 z 2 p If p0 0 p p0 q 0 1 n1 1 n2 n p n2 p2 p0 1 1 n1 n2 s p H : p 0 H : p p 2 Our Hypotheses are 0 or 0 1 where p p1 p2 . If we use the test ratio method, we H 1 : p 0 H 1 : p1 p 2 61 31 61 31 92 .5083 , p 2 .3875 and p 0 .46 . So p p1 p 2 need to find p1 120 80 120 80 120 .5083 .3875 .1208 . p 0 q 0 1 1 .46 .54 1 1 . So n2 120 80 .005175 .07193 n1 p p 0 .1208 z 1.680 . Since z z.01 2.327 do not reject H 0 . We do not reject the null p .07193 p hypothesis if z 2.327 . c) Table 3 says the following: Interval for Confidence Interval Proportion p p z 2 s p pq n p 1 p sp H 0 : p .50 H 1 : p .50 Hypotheses Test Ratio H 0 : p p0 z H1 : p p0 p p0 p Critical Value pcv p0 z 2 p p p0 q0 n In the last part of the problem, we found that the proportion of people in the 'outstanding' or 'good' categories was p .46. Thus, if we use the test ratio method p p0 .46 .50 .04 z 1.1314 . We reject H 0 if z is not between z.005 2.576 . It is p . 03536 .50 .50 200 between these values, so we do not reject H 0 . 21 5/5/00 252z0043 6. In an effort to teach safety principles to a group of your employees, 22 employees were randomly assigned to one of four groups. After the sessions they took a test that was scored from 0 to 10 with the following results: Programmed Lecture Videotape Discussion Instruction 7 8 7 8 6 5 9 5 5 8 6 6 6 6 8 6 6 9 5 5 8 10 Do statistical tests of the following: .01 (Assume that the underlying distribution is Normal) a. Is there a difference between the means? (7) b. Does column 4 have a Normal distribution with a population mean of 7.2 and a population standard deviation of 1.5? (5) c. At the same time we gave the managers a test on safety and then a day of training - scores were not reported, but of 15 managers 11 performed better after the day of training. Use a sign test to show if the day of training was successful. .05 (4) Solution: Note!! A test of multiple means is an Analysis of Variance! Every year I see people trying to compare more than two means by a method appropriate for comparing two means. It doesn't work! is defined as a difference between two means, when you have more than two that definition doesn't work. Also, simply computing the means and telling me that they are different is just a way of making me suspect that you don't know what a statistical test is. a) Because we are comparing means under the assumption that the underlying distribution is normal, this is an ANOVA. Sum x 1 x 2 x 3 x 4 7 6 5 6 6 8 38 8 5 8 6 9 …. 7 9 6 8 5 .… 36 + 35 + 8 5 6 6 5 10 40 149 nj 6+ 5+ 5+ 6 22 n x j 6.3333 7.2000 7.0000 6.6667 SS 246 + Sum x ij x ij i 255 + 286 2 xij2 nx 2 1057 226.7727 2 47.8636 2 2 2 2 2 2 2 2 . j x n j x. j nx 66.3333 57.2000 57.0000 66.6667 22 6.7727 x SSB x SST 270 + 149 6.7727 x 22 1057 xij2 ij x 1011 .53347 1009 .12824 2.4052 22 5/5/00 252z0043 Source SS Between 2.4052 DF MS 3 0.8017 F 0.32 F.01 H0 F 3,18 5.09 ns Column means equal Within 45.4584 18 2.5255 Total 47.8636 21 H 0 : 1 2 3 H 1 : Not all means equal Explanation: Since the Sum of Squares (SS) column must add up, 45.4584 is found by subtracting 2.4057 from 47.8636. Since n 22 , the total degrees of freedom are n 1 21 . Since there are 4 random samples or columns, the degrees of freedom for Between is 4 – 1 = 3. Since the Degrees of Freedom (DF) column must add up, 18 = 21 – 3. The Mean Square (MS) column is found by dividing the SS column by the DF MSB column. 0.8017 is MSB and 2.5255 is MSW . F , and is compared with F.01 from the F table MSW df1 3, df 2 18 . Because our computed F is less than the table F , do not reject H 0 . b) Because the mean and variance are known and the sample is small, the only test that is practical is the Kolmogorov-Smirnov Test. H 0 : x 4 ~ N 7.2, 1.5 The column Fe is the cumulative distribution computed from the Normal table. z is x1 1 1 x1 7.2 . 1.5 Fo is the Cumulative O divided by n 7 . D Fo Fe O Cumulative O Fo Fe z D 1 1 .1667 -1.47 .0708 .0959 1 2 .3333 -1.47 .0708 .2625 1 3 .5000 -0.80 .2119 .2881 1 4 .6667 0.80 .2119 .4548 1 5 .8333 0.53 .7019 .1314 1 6 1.0000 1.87 .9693 .0307 7 From the Kolmogorov-Smirnov Table, the critical value for a 95% confidence level is .4050. Since the largest number in D is above this value, we reject H 0 . x1 5 5 6 6 8 10 H 0 : p .5 c) We get the p-value for this result by using the binomial table with p .5 and n 15 . H 1 : p .5 pvalue Px 11 1 Px 10 1 .94077 .05923 . Since this is greater than .05 , we do not reject H 0 and thus conclude that the training was not successful. 23 5/5/00 252z0043 7. Three groups of Executives are given a test on management principles. We will assume that the underlying distribution is not Normal. (M&L p627) Manufacturing Executives Score Rank 51 9 31 7 14 1 69 14 86 17 62 12 96 20 80 Finance Executives Score Rank 15 2 32 8 68 13 87 18 20 3.5 28 6 77 16 97 21 87.5 Trade Executives Score Rank 89 19 20 3.5 60 11 72 15 56 10 22 5 63.5 Using rank tests, test the following: a. The distributions of scores are the same for all three groups (7) b. Taken as a single group, nonmanufacturing executives do worse on the test than manufacturing executives. (7) c. The median score for Finance executives is 60 (Do not use a sign test if you used it in the last problem.) (4 points for a sign test, 5 for a better method) d. 45 days after you get back from Cancun, your doctor orders a runs test. If + indicates days when you had the runs and - indicates days when you did not. There were 27 + days and 18 - days, and a total of 18 runs of either plusses or minuses. Was the sequence random? (5) Solution: a) Since this involves comparing three apparently random samples from a non-normal distribution, we use a Kruskal-Wallis test. The null hypothesis is H 0 : Columns come from same distribution or medians are equal. Sums of ranks were given above. To check the ranking, note that the sum of the three rank sums is 80 + 87.5 + 63.5 = 231, that the total number of items is 7 + 8 + 6 = 21 and that the sum of the first n numbers nn 1 2122 231 . Now, compute the Kruskal-Wallis statistic is 2 2 2 2 2 12 SRi 2 3n 1 12 80 87 .5 63 .5 322 H 8 6 nn 1 i ni 2122 7 12 914 .2857 957 .0312 672 .0417 66 .025974 2543 .3586 66 0.0613 . If we try to look up this 462 result in the (7 ,8 ,6) section of the Kruskal-Wallis table (Table 9) , we find that the problem is to large for the table. Thus we must use the chi-squared table with 2 degrees of freedom. Since .2052 5.9915 do not reject H 0 . b) Because we are comparing two random samples from a nonnormal distribution, we use the WilcoxonMann-Whitney Method. If we designate manufacturing as sample 1 and nonmanufacturing as sample 2, our hypotheses are H 0 : 1 2 and H1 : 1 2 . The sum of ranks for manufacturing is 80. The sum of ranks for nonmanufacturing is 87.5 + 63.5 = 151. As in part a), their sum is 231, and this checks out as equal to nn 1 . 2 24 5/5/00 252z0043 We designate the smaller of the two rank sums, 80, as W . We are unable to find critical values or p-values for a 5% two-tailed test with n1 7 and n2 12 on either of the Wilcoxon-Mann-Whitney tables, since n2 12 is too high. The outline says that for values of n1 and n 2 that are too large for the tables, W has the normal distribution with mean W 1 2 n1 n1 n 2 1 1 2 77 14 1 77 and variance W2 16 n2 W 16 1477 179.6667 . W 179 .6667 13 .4040 . Note that our value of W is above the mean. This is because the average rank of sample 1 is higher than the average rank of sample 2, as it would have to be if nonmanufacturing executives do worse on the test. This means that we are doing a right W W 80 77 0.2238 . Since this is below z .05 1.645 , we do not reject H 0 . sided test. z W 13 .4040 c) The Wilcoxon Signed rank test for paired data was used in class as a powerful test of the median. Our hypotheses are H 0 : 60 and H1 : 60 . The difference column will be x x 60 . x difference rank 15 -45 8 If we total negative and positive ranks separately, we get 32 -28 4 T 24 and T 12 . According to the Wilcoxon signed 68 8 1 + rank test table, the 2.5% value for n 8 is 4. Since the 87 27 3 + smaller of the two rank sums, 12, is above this critical 20 -40 7 value, do not reject the null hypothesis. 28 -32 5 77 17 2 + 97 37 6 d) This is, of course, a runs test. n 45 is the total number of items, n1 27 , n2 18 and r 18 . To test the null hypothesis of randomness for a small sample, assume that the significance level is 5% and use the table entitled 'Critical values of r in the Runs Test.’ Unfortunately, n1 27 is to high for the table. According to the outline, for a larger problem (if n1 and n 2 are too large for the table), r follows the 2n1 n 2 1 2 . Then 2n1 n 2 1 227 18 1 1 and 2 n n 1 n 45 1 2 r 21 . 6 20 . 6 18 22 . 6 10 .1127 . So z 1.44 . Since this 22 .6 and 2 n 1 44 10 .1127 value of z is between z 1.960 , we do not reject H 0 : Randomness. normal distribution with 2 25