y

5/5/00 252z0043 2. A sales manager wishes to predict unit sales by sales persons ( y ) on the basis of number of calls made ( x1 ) and number of different products the salesperson pushes ( x 2 ). The data is below (Use   .01) . Row units y 28 71 38 70 22 27 28 47 14 70 1 2 3 4 5 6 7 8 9 10 calls products x1 14 35 22 29 6 15 17 20 12 30 x2 2 5 4 5 2 3 3 5 1 5 x y 1 392 2485 836 2030 132 405 476 940 168 2100 9964 x1 y . There is no way in the universe to get Many people were too lazy or ignorant to compute  x y from  x1 and  y nor to get  x x 1 2 1  from x 1 x and 2 . You will always be asked to compute a sum of this sort on an exam, so figure out how to do it in advance. The quantities below were given: y  415 , y 2  21471 , x  200, x12  4740, x  35, x 22  143,    x y  ?,  x 1  y  1721 ,  x x 1 2 1 2    2  806 and n  10 . You do not need all of these. a. Compute a simple regression of units against calls.(8) b. Compute R 2 (4) c. Compute s e (3) d. Compute s b1 ( the std deviation of the slope) and do a confidence interval for 1 .(3) e. Do a prediction interval for units when the salesperson makes 5 calls. (3) Why is this interval likely to be larger than other prediction intervals we might compute for numbers of calls that we have actually observed? (1) Solution: x1 y  28 14   7135             70 30   9964 See computation above. a)  Spare Parts Computation: x1 x  y 1 n 200   20 .000 10  y  415  41.500 n 10 SSx1  Sx1 y  SSx1  x y  nx y  1664 .000  2.24865  x  nx 740 .000 1 1 2 1 1 2 2 1  nx12  4740  10 20 .000 2  740 .000 Sx1 y   x y  nx y  9964 1020.000 41.500  1 1  1664 .000 SSy  b1  x y 2  ny  21471  10 41 .5000 2 2  4248 .500  TSS b0  y  b1 x1  41.500  2.24865 20.000  3.4729 Yˆ  b0  b1 x1 becomes Yˆ  3.4727  2.24865 x1 . Lots of people found b2 instead. 1 5/5/00 252z0043 They hadn't read the question!  x1 y  nx1 y  Sx1 y 2 1664 .000 2     0.8807 SSx1 SSy  x12  nx12  y 2  ny 2  740 .000 4248 .500  2 b) R 2 ( 0  R 2  1 always!) SSR  b1 Sx1 y  b1  x y  nx y   2.24865 1664 .000   3741 .75 1 1 R2  SSR 3741 .75   0.8807 could be SST 4248 .50 used in b) or SSR  R 2 SST   .88074248.5  3741.65 could be used in c). c) SSE  SST  SSR  4248 .5  3741 .75  506 .75 s e  63 .3436  7.95887 d) s b21  s e2  SSx1   nx12 8 tn2  t.005  3.355 SSE 506 .75   63 .3436 n2 8 ( s e2 is always positive!) s e2 x12 s e2    63 .3436  0.08560 740 .000 sb1  0.08560  0.29257 so 1  b1  tsb1  2.24865 3.3550.29257  2.25  0.98 . Note: Some 2 1 b0 b x 2  versions of this exam asked for  0  b0  tsb0 , s b20  s e2   , or t1  1 . You have to  n SSx  t 0  s s b1 1 b0  read the question to find out which one is wanted. Many people didn't. e) . If Yˆ  3.4727  2.24865 x1 and x10  5 , then Yˆ0  3.4727  2.248655  7.770 From the regression formula outline the Prediction Interval is Y0  Yˆ0  t sY , where 1 sY2  s e2   n  X 0  X 2  X 2  nX 2  1   1 . So sY2  s e2    n   x1  x1 2  2  1 x1  x1 2   x12  nx12   1  se  n  SSx1  1 0 0  1 5  20 2   63 .3436    1  63 .3436 1.4041   88 .9400 s y0  88.94  9.4308 .  10  740   ˆ So Y0  Y0  t sY  7.770  3.355 9.4308   7.77  31.64 . This interval will be smallest when x1 x 1  20 . Because 5 is below any values of x1 that we actually have, the prediction interval will be relatively gigantic as the x 0  x 2 involves values of x 0 that are far from the mean. 12 5/5/00 252z0043 3. Data from problem 2 is repeated. (Use   .01) . Row units y 28 71 38 70 22 27 28 47 14 70 1 2 3 4 5 6 7 8 9 10 calls x1 14 35 22 29 6 15 17 20 12 30 products x2 2 5 4 5 2 3 3 5 1 5  y  415 ,  y  21471 ,  x1  200,  x12  4740,  x 2  35,  x22  143,  x y  ?,  x y  1721,  x x  806 and n  10 . 2 1 2 1 2 a. Do a multiple regression of units against calls and products. (12) b. Compute R 2 and R 2 adjusted for degrees of freedom for both this and the previous problem. Compare the values of R 2 adjusted between this and the previous problem. Use an F test to compare R 2 here with the R 2 from the previous problem.(6) c. Compute the regression sum of squares and use it in an F test to test the usefulness of this regression. (5) d. Use your regression to predict the number of units sold when a sales person makes 20 calls and pushes 5 products.(2) e. Use the directions in the outline to make this estimate into a confidence interval and a prediction interval. (4) 35  3.500 . Second, we compute or Solution: a) First, we compute Y  41 .500 , X 1  20 .000 and X 2  10 copy X  X Y  9964 ,  X 1 1X 2 2Y  1721 , Y 2  21471 , X 2 1  4740 , X 2 2  143 and  806 . Third, we compute or copy our spare parts:  Y  nY  4248 .500 * Sx y   X Y  nX Y  9964  1020.000 41.500   1664 .00 Sx y   X Y  nX Y  1721  103.500 41.500   268 .5 SSx1   X 12  nX 12  4740  1020.0002  740.00 * SSx2   X 22  nX 22  143  103.52  20.500* and Sx x   X X  nX X  806  1020.000 3.500   106 .000 . SSy  2 2 1 1 2 2 1 2 1 2 1 2 1 2 * indicates quantities that must be positive. (Note that some of these were computed for the last problem.) 13 5/5/00 252z0043 Fourth, we substitute these numbers into the Simplified Normal Equations: X 1Y  nX 1Y  b1 X 12  nX 12  b2 X 1 X 2  nX 1 X 2    X Y  nX Y  b  X X 2 which are 2 1 1 2    nX X   b  X 1 2 2 2 2   nX  , 2 2 1664 .00  740 .00b1  106 .00 b2  268 .500  106 .00 b1  20 .500 b2 and solve them as two equations in two unknowns for b1 and b2 . We do this by multiplying the second equation by 6.9811, which is 740.00 divided by 106.00. The purpose of this is to make the coefficients of b1 equal in both equations. We could do just as well by multiplying the second equation by 20.5 divided by 106 and making the coefficients b2 equal. 1664 .00  740 .00b1  106 .00 b2 So the two equations become  . We then subtract the second equation 1874 .43  740 .00 b1  143 .11b2 210 .43  5.6704 . The first of the two normal from the first to get 210 .43  37 .11b2 , so that b 2   37 .11 equations can now be rearranged to get 740 b1  1874 .43  106 .005.6704  , which gives us b1  1.4364 . Finally we get b0 by solving b0  Y  b1 X 1  b2 X 2  41.500  1.4364 20 .00   5.6704 3.500   7.0954 . Thus our equation is Yˆ  b0  b1 X 1  b2 X 2  7.0954  1.4364X 1  5.6704X 2 . Note: An alternate way of solving the Simplified normal equations is to multiply the second equation by 1664 .00  740 .00b1  106 .00 b2 5.1707 which is 106 divided by 20.5. The resulting equations are  We then 1388 .34  548 .10 b1  106 .00 b2 275 .66  1.436 . If we subtract the second equation from the first to get 275 .66  191 .90b1 , so that b1  191 .90 then solve for b 2 , we get essentially the same answer. b) The Regression sum of Squares is SSR  b1 X 1Y  nX 1Y  b2 X 2 Y  nX 2 Y  b1 Sx1 y   b2 Sx2 y       1.4364 1664 .00   5.6704 268 .500   3912 .672 * and is used in the ANOVA below. The coefficient of determination is R 2   SSR b1 Sx1 y   b2 Sx 2 y   SST SSy 1.4364 1664 .00   5.6704 268 .500   3912 .672 4246 .50 R2 * .8807 .9210 4246 .50 n 10 10  .9210 * . Our results can be summarized below as: k 1 2 R 2 , which is R 2 adjusted for degrees of freedom, has the formula R 2  R2 .8807 .8984 n  1R 2  k , where n  k 1 k is the number of independent variables. R 2 adjusted for degrees of freedom seems to show that our second regression is better. 14 5/5/00 252z0043 One way to do the F test is to note that the total sum of squares is SSy  Y 2  nY 2  4246 .500 . For the regression with one independent variable the regression sum of squares is SSR  b1 Sx1 y  b1 x1 y  nx1 y  2.24865 1664 .000   3741 .75 *. For the regression with two   independent variables the regression sum of squares is was computed in b) as 3912.672.. The difference between these is 170.922. The remaining unexplained variation is SSE  SST  SSR = 4248.500 – 3912.672 = 335.828*. The ANOVA table is Source X1 X2 SS* 3741.75 170.922 DF* 1 1 MS* 3741.75 170.922 F* 3.563 F.01 F71  12.25 335.828 7 47.9755 Error 4248.500 9 Total Since our computed F is smaller than the table F , we do not reject our null hypothesis that X 2 has no effect. A faster way to do this is to use the R 2 s directly. The difference between R 2 = 88.07% and R 2 = 92.10% is 4.03%. Source SS* DF* MS* F* F.01 88.07 1 88.07 X1 X1 4.03 1 4.03 3.57 F71  12.25 7.90 7 1.12857 Error 100.00 9 Total The numbers are a bit different because of rounding, but the conclusion is the same. c) We computed the regression sum of squares in the previous section. Source SS DF MS 3912.672 2 1956.33 X1 , X 2 F 41.02 F.01 2,7   9.55 F.01 335.828 7 47.975 Error 4248.500 9 Total Since our computed F is larger than the table F , we reject our null hypothesis that X 1 and X 2 do not explain Y . d) Yˆ  b0  b1 X 1  b2 X 2  7.0954  1.4364X 1  5.6704X 2 . Since the last few digits don't seem to mean a lot I used Yˆ  7.10  1.4420  5.675  50.05 . 15 5/5/00 252z0043 From the ANOVA above SSE  335 .828 . s e2 Y  2  nY 2  b1  X Y  nX Y  b  X Y  nX Y   Y  1 1 2 2 2 2   nY 2 1  R 2  n3 n3 SSE 335 .828    47 .975 * . This can be read from the MS in the ANOVA above. n  k 1 7 s e  47 .975  6.926 . According to the outline " An approximate confidence interval  Y0  Yˆ0  t prediction interval Y0  Yˆ0  t s e ." se se and an approximate n 7 Use tnk 1  t.005  3.499. So the Confidence Interval is 2  50 .05  3.499   50 .05  7.66 and the Prediction Interval is Y0  Yˆ0  t s e n 10  50.05  3.499 6.926   50.05  24.2.  Y0  Yˆ0  t 6.926 16 5/5/00 252z0043 4. Your country's tourist office reports the following tourist arrivals over a 20 year period. year arrivals (thousands) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 11.75 78.93 203.04 268.95 380.49 457.32 525.51 596.56 640.74 710.67 748.02 795.13 845.21 843.08 922.58 945.22 934.72 945.67 952.38 933.86 Do the following: a. Using only the R 2 s given above:   .01 (i) Show that R 2 adjusted for degrees of freedom rises between the first and second regression. (2) (ii) Fake an F test to show that the addition of the year squared improves the regression. (4) (iii) Test the correlation between arrivals and year for significance (3) (iv) Test the hypothesis that the correlation between arrivals and year is .99 (4) b. Compute a rank correlation between arrivals d  0 you are and year (Note: if you can't get  Your assistant fits the following equations to the data: arrivals = 162 + 50.0 year (39.5) (3.55) R-sq =91.7% Durbin-Watson statistic = 0.19 wasting both our time) (3) and (i) Test it for significance (2) (ii) Explain why it is higher than the correlation you computed in part a above. (1) c. Explain what the values of the Durbin-Watson statistics show. (4) arrivals= -3.34 + 105 year - 2.90 yearsq (8.94) (2.18) (0.111) R-sq = 99.8% Durbin-Watson statistic = 2.48 Solution: a) (i) R 2 , which is R 2 adjusted for degrees of freedom, has the formula R 2  n  1R 2  k , n  k 1 where k is the number of independent variables. For the first one, n  20 and k  1 so 19 0.917  1  0.912 , and for the second one, n  20 and k  2 so R 2  19 0.998  2  0.998 . R2  18 17 (ii) The difference between R 2 = 91.7% and R 2 = 99.8% is 8.1%. Source SS DF MS 91.7 1 X1 X2 8.1 1 8.1 F 68.85 F.01 1,17  8.40 F.01 0.2 17 0.11764 Error 100.0 19 Total Since our computed F is larger than the table F , we reject our null hypothesis that X 2 has no effect. 17 5/5/00 252z0043  XY  nXY  X  nX  Y (iii) The simple sample correlation coefficient is r  2 2 2  nY 2 , square root of  XY  nXY    X  nX  Y  nY   .917 . Since this was given by the printout, we don't need to compute 2 R 2 2 2 2 2 it, so r  .917  .9576 . From the outline, if we want to test H 0 : xy  0 against H1 : xy  0 and x and y are normally distributed, we use t n  2   r 1 r n2 2  9576 1  .9576  20  2 2 18  2.878 ,  14 .1018 . Since t .005 we reject H 0 . (ii) The outline says, if   0, and we want to test H 0 : xy   0 against H1 : xy   0 “ we need to use 1  1  0 1 1 r  Fisher's z-transformation. Let ~ z  ln   . This has an approximate mean of  z  ln  2  1  0 2  1 r  ~ n 2  z  z 1  standard deviation of s z  , so that t . “ So if r  .9576, n  20 and sz n3 1  1  .9576  1  1  .99   0  .99, ~z  ln    2.64665 ,  z  ln    1.91616 , s z    and a  1  0.242536 and 20  3 2  1  .9576  2  1  .99  ~ n  2  z   z 2.64665  1.91616 18 t    3.011 Since t .005  2.878 and this is a 2-sided test, we reject H 0 . sz 0.242536 b) (i) The data is repeated below with the calculations for rank correlation. year arrivals x1 x2 r1 r2 d  r1  r2 d 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 rs  1  11.75 78.93 203.04 268.95 380.49 457.32 525.51 596.56 640.74 710.67 748.02 795.13 845.21 843.08 922.58 945.22 934.72 945.67 952.38 933.86 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10 11 12 14 13 15 18 17 19 20 16 0 0 0 0 0 0 0 0 0 0 0 0 1 -1 0 -2 0 -1 -1 4 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 4 0 1 1 16 24  d  1  624   0.9820 . If we want a 2-sided test at the 99% confidence level of nn  1 20 20   1 2 6 2 2 H 0 :   0 , compare rs with the 0.5% value from the Pearson’s rank correlation coefficient table. Since the table value is .4451 reject the null hypothesis. We conclude that the rank correlation is significant. 18 5/5/00 252z0043 (ii) The second regression shows that there is a slight curvature in the relation between the two variables. Since correlation tests for a linear relationship, it is not quite appropriate, but rank correlation will detect a slightly curved but generally positive relationship. c) A Durbin-Watson Test is a test for autocorrelation. For   .01 , k  1 and n  20 , the text table gives d L  .95 and d U  1.15 . The null hypothesis is ‘No Autocorrelation’ and our rejection region is d  d L, 2  .95 or d  4  d L, 2  3.05 . We really should use the   .005 value for d L , but a check of the   .05 table leaves us sure that it is somewhat below .95. thus the D-W statistic of 0.18 is probably in the rejection region. For   .01 , k  2 and n  20 , the text table gives d L  .86 and d U  1.27 The 'do not reject' region is between dU ,  1.27 and 4  dU ,  2.73 . 2.48 is in this region, but this is really for 2 2   .02 . We can't be sure if we actually use   .01 . 19 5/5/00 252z0043 5. An analysis of a sample of 200 prisoners of their adjustment to civil life after release from prison reveals the following:   .01 Residence Adjustment to Civil Life. after release Outstanding Good Fair Poor Total Hometown 27 34 34 25 120 Not Hometown 15 16 24 25 80 Total 42 50 58 50 200 Do statistical tests of the following: a. The proportion in each adjustment category was the same for both 'hometown' and 'not hometown' groups. (8) b. The proportion in the combined 'outstanding' and 'good' categories was higher in the 'hometown' group than the 'not hometown' group. (5) c. The combined proportion of the whole group of 200 that made an 'outstanding' or 'good' adjustment was 50% (4) Solution: Note!! A test of multiple proportions is a  2 test! Every year I see people trying to compare more than two proportions by a method appropriate for b) below. It doesn't work! p is defined as a difference between two proportions, when you have more than two that definition doesn't work. Also, simply computing the proportions and telling me that they are different is just a way of making me suspect that you don't know what a statistical test is. a) The data is copied below. The p r s are found by dividing the row sums in O by grand total. The p r s are then used to multiply the column totals to get the material in E. O O G F P Total p r E O G F P Total p r H 27 34 34 25 120 .60 H 25 .2 30 .0 34 .8 30 .0 120 .60 80 .40 80 .40 NH 15 16 24 25 NH 16 .8 20 .0 23 .2 20 .0 Total 42 50 58 50 200 Total 42 .0 50 .0 58 .0 50 .0 200 This is a chi-squared test of homogeneity. Our null hypothesis is 'Homogeneity' . The calculations are done in two ways below. Save time by computing only Row O 1 27 2 34 3 34 4 25 5 15 6 16 7 24 8 25 Total 200 OE E 25.2 30.0 34.8 30.0 16.8 20.0 23.2 20.0 200.0 -1.80000 -4.00000 0.80000 5.00000 1.80000 4.00000 -0.80000 -5.00000 O  E 2 3.2400 16.0000 0.6400 25.0000 3.2400 16.0000 0.6400 25.0000 O2 . E O  E 2 E 0.12857 0.53333 0.01839 0.83333 0.19286 0.80000 0.02759 1.25000 3.78406 O  E 2 O2 E 28.9286 38.5333 33.2184 20.8333 13.3929 12.8000 24.8276 31.2500 203.7841 O2  n  203 .7841  200  3.7841 E E  .2013  11 .3449 so do not reject the null hypothesis. We conclude that, except for random variations, the df  r  1c  1  13  3 2     proportion in each category is the same for both groups. 20 5/5/00 252z0043 b) From Table 3 Interval for Difference Between Proportions q  1 p Confidence Interval Hypotheses Test Ratio p  p  z 2 sp H 0 : p  p 0 p  p1  p 2 H 1 : p  p 0 z sp  p 0  p 01  p 02 p1q1 p2 q 2  n1 n2 or p 0  0 p  p 0  p If p  0  p  p01q 01 p02 q 02  n1 n2 Or use Critical Value pcv  p0  z 2  p If p0  0  p  p0 q 0  1 n1  1 n2  n p  n2 p2 p0  1 1 n1  n2 s p H : p  0 H : p  p 2 Our Hypotheses are  0 or  0 1 where p  p1  p2 . If we use the test ratio method, we H 1 : p  0 H 1 : p1  p 2 61 31 61  31 92  .5083 , p 2   .3875 and p 0    .46 . So p  p1  p 2 need to find p1  120 80 120  80 120  .5083 .3875  .1208 .   p 0 q 0  1  1   .46 .54  1 1 . So n2  120 80  .005175  .07193  n1 p  p 0 .1208 z   1.680 . Since z  z.01  2.327 do not reject H 0 . We do not reject the null  p .07193  p  hypothesis if z  2.327 . c) Table 3 says the following: Interval for Confidence Interval Proportion p  p  z 2 s p pq n  p  1 p sp  H 0 : p  .50  H 1 : p  .50 Hypotheses Test Ratio H 0 : p  p0 z H1 : p  p0 p  p0 p Critical Value pcv  p0  z 2 p p  p0 q0 n In the last part of the problem, we found that the proportion of people in the 'outstanding' or 'good' categories was p  .46. Thus, if we use the test ratio method p  p0 .46  .50  .04 z    1.1314 . We reject H 0 if z is not between z.005  2.576 . It is p . 03536   .50 .50 200 between these values, so we do not reject H 0 . 21 5/5/00 252z0043 6. In an effort to teach safety principles to a group of your employees, 22 employees were randomly assigned to one of four groups. After the sessions they took a test that was scored from 0 to 10 with the following results: Programmed Lecture Videotape Discussion Instruction 7 8 7 8 6 5 9 5 5 8 6 6 6 6 8 6 6 9 5 5 8 10 Do statistical tests of the following:   .01 (Assume that the underlying distribution is Normal) a. Is there a difference between the means? (7) b. Does column 4 have a Normal distribution with a population mean of 7.2 and a population standard deviation of 1.5? (5) c. At the same time we gave the managers a test on safety and then a day of training - scores were not reported, but of 15 managers 11 performed better after the day of training. Use a sign test to show if the day of training was successful.   .05  (4) Solution: Note!! A test of multiple means is an Analysis of Variance! Every year I see people trying to compare more than two means by a method appropriate for comparing two means. It doesn't work!    is defined as a difference between two means, when you have more than two that definition doesn't work. Also, simply computing the means and telling me that they are different is just a way of making me suspect that you don't know what a statistical test is. a) Because we are comparing means under the assumption that the underlying distribution is normal, this is an ANOVA. Sum x 1 x 2 x 3 x 4 7 6 5 6 6 8 38 8 5 8 6 9 …. 7 9 6 8 5 .… 36 + 35 + 8 5 6 6 5 10 40  149  nj 6+ 5+ 5+ 6  22  n x j 6.3333 7.2000 7.0000 6.6667 SS 246 + Sum  x ij  x ij i 255 + 286  2   xij2  nx 2  1057  226.7727 2  47.8636 2 2 2 2 2 2 2 2 . j  x    n j x. j  nx  66.3333   57.2000   57.0000   66.6667   22 6.7727   x SSB   x SST  270 + 149  6.7727  x 22  1057  xij2 ij x  1011 .53347  1009 .12824  2.4052 22 5/5/00 252z0043 Source SS Between 2.4052 DF MS 3 0.8017 F 0.32 F.01 H0 F 3,18  5.09 ns Column means equal Within 45.4584 18 2.5255 Total 47.8636 21 H 0 : 1   2   3 H 1 : Not all means equal Explanation: Since the Sum of Squares (SS) column must add up, 45.4584 is found by subtracting 2.4057 from 47.8636. Since n  22 , the total degrees of freedom are n  1  21 . Since there are 4 random samples or columns, the degrees of freedom for Between is 4 – 1 = 3. Since the Degrees of Freedom (DF) column must add up, 18 = 21 – 3. The Mean Square (MS) column is found by dividing the SS column by the DF MSB column. 0.8017 is MSB and 2.5255 is MSW . F  , and is compared with F.01 from the F table MSW df1  3, df 2  18  . Because our computed F is less than the table F , do not reject H 0 . b) Because the mean and variance are known and the sample is small, the only test that is practical is the Kolmogorov-Smirnov Test. H 0 : x 4 ~ N 7.2, 1.5 The column Fe is the cumulative distribution computed from the Normal table. z is x1  1 1  x1  7.2 . 1.5 Fo is the Cumulative O divided by n  7 . D  Fo  Fe O Cumulative O Fo Fe z D 1 1 .1667 -1.47 .0708 .0959 1 2 .3333 -1.47 .0708 .2625 1 3 .5000 -0.80 .2119 .2881 1 4 .6667 0.80 .2119 .4548 1 5 .8333 0.53 .7019 .1314 1 6 1.0000 1.87 .9693 .0307 7 From the Kolmogorov-Smirnov Table, the critical value for a 95% confidence level is .4050. Since the largest number in D is above this value, we reject H 0 . x1 5 5 6 6 8 10  H 0 : p  .5 c)  We get the p-value for this result by using the binomial table with p  .5 and n  15 .  H 1 : p  .5 pvalue  Px  11  1  Px  10  1  .94077  .05923 . Since this is greater than   .05 , we do not reject H 0 and thus conclude that the training was not successful. 23 5/5/00 252z0043 7. Three groups of Executives are given a test on management principles. We will assume that the underlying distribution is not Normal. (M&L p627) Manufacturing Executives Score Rank 51 9 31 7 14 1 69 14 86 17 62 12 96 20 80 Finance Executives Score Rank 15 2 32 8 68 13 87 18 20 3.5 28 6 77 16 97 21 87.5 Trade Executives Score Rank 89 19 20 3.5 60 11 72 15 56 10 22 5 63.5 Using rank tests, test the following: a. The distributions of scores are the same for all three groups (7) b. Taken as a single group, nonmanufacturing executives do worse on the test than manufacturing executives. (7) c. The median score for Finance executives is 60 (Do not use a sign test if you used it in the last problem.) (4 points for a sign test, 5 for a better method) d. 45 days after you get back from Cancun, your doctor orders a runs test. If + indicates days when you had the runs and - indicates days when you did not. There were 27 + days and 18 - days, and a total of 18 runs of either plusses or minuses. Was the sequence random? (5) Solution: a) Since this involves comparing three apparently random samples from a non-normal distribution, we use a Kruskal-Wallis test. The null hypothesis is H 0 : Columns come from same distribution or medians are equal. Sums of ranks were given above. To check the ranking, note that the sum of the three rank sums is 80 + 87.5 + 63.5 = 231, that the total number of items is 7 + 8 + 6 = 21 and that the sum of the first n numbers nn  1 2122    231 . Now, compute the Kruskal-Wallis statistic is 2 2 2 2 2   12  SRi 2      3n  1   12  80   87 .5  63 .5   322  H  8 6   nn  1 i  ni   2122   7   12 914 .2857  957 .0312  672 .0417   66  .025974 2543 .3586   66  0.0613 . If we try to look up this 462 result in the (7 ,8 ,6) section of the Kruskal-Wallis table (Table 9) , we find that the problem is to large for the table. Thus we must use the chi-squared table with 2 degrees of freedom. Since  .2052   5.9915 do not reject H 0 . b) Because we are comparing two random samples from a nonnormal distribution, we use the WilcoxonMann-Whitney Method. If we designate manufacturing as sample 1 and nonmanufacturing as sample 2, our hypotheses are H 0 : 1   2 and H1 : 1   2 . The sum of ranks for manufacturing is 80. The sum of ranks for nonmanufacturing is 87.5 + 63.5 = 151. As in part a), their sum is 231, and this checks out as equal to nn  1 . 2 24 5/5/00 252z0043 We designate the smaller of the two rank sums, 80, as W . We are unable to find critical values or p-values for a 5% two-tailed test with n1  7 and n2  12 on either of the Wilcoxon-Mann-Whitney tables, since n2  12 is too high. The outline says that for values of n1 and n 2 that are too large for the tables, W has the normal distribution with mean W  1 2 n1 n1  n 2  1  1 2 77  14  1  77 and variance  W2  16 n2 W  16 1477  179.6667 .  W  179 .6667  13 .4040 . Note that our value of W is above the mean. This is because the average rank of sample 1 is higher than the average rank of sample 2, as it would have to be if nonmanufacturing executives do worse on the test. This means that we are doing a right W  W 80  77   0.2238 . Since this is below z .05  1.645 , we do not reject H 0 . sided test. z  W 13 .4040 c) The Wilcoxon Signed rank test for paired data was used in class as a powerful test of the median. Our hypotheses are H 0 :  60 and H1 :  60 . The difference column will be x   x  60 . x difference rank 15 -45 8 If we total negative and positive ranks separately, we get 32 -28 4 T   24 and T   12 . According to the Wilcoxon signed 68 8 1 + rank test table, the 2.5% value for n  8 is 4. Since the 87 27 3 + smaller of the two rank sums, 12, is above this critical 20 -40 7 value, do not reject the null hypothesis. 28 -32 5 77 17 2 + 97 37 6 d) This is, of course, a runs test. n  45 is the total number of items, n1  27 , n2  18 and r  18 . To test the null hypothesis of randomness for a small sample, assume that the significance level is 5% and use the table entitled 'Critical values of r in the Runs Test.’ Unfortunately, n1  27 is to high for the table. According to the outline, for a larger problem (if n1 and n 2 are too large for the table), r follows the 2n1 n 2   1  2 . Then   2n1 n 2  1  227 18   1  1 and  2  n n 1 n 45      1   2 r      21 . 6 20 . 6 18  22 . 6   10 .1127 . So z    1.44 . Since this  22 .6 and  2  n 1 44  10 .1127 value of z is between  z  1.960 , we do not reject H 0 : Randomness. normal distribution with   2 25

y

Related documents

Products

Support

y

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib