4/19/03 252y0332 ECO252 QBA2 Name

4/19/03 252y0332 (Page layout view!) ECO252 QBA2 THIRD HOUR EXAM April 21 - 22, 2003 Name KEY Hour of Class Registered (Circle) I. (30+ points) Do all the following (2points each unless noted otherwise). 1. Which of the following components in an ANOVA table are not additive? a) Sum of squares. b) Degrees of freedom. c) *Mean squares. d) It is not possible to tell. TABLE 11-4 A campus researcher wanted to investigate the factors that affect visitor travel time in a complex, multilevel building on campus. Specifically, he wanted to determine whether different building signs (building maps versus wall signage) affect the total amount of time visitors require to reach their destination and whether that time depends on whether the starting location is inside or outside the building. Three subjects were assigned to each of the combinations of signs and starting locations, and travel time in seconds from beginning to destination was recorded. An Excel output of the appropriate analysis is given below: ANOVA Source of Variation Signs SS df MS F 14008.33 1 14008.33 12288 1 12288 48 1 48 Within 35305.33 8 4413.167 Total 61649.67 11 Starting Location Interaction 2. P-value F crit 0.11267 5.317645 2.784395 0.13374 5.317645 0.0109 0.919506 5.317645 Referring to Table 11-4, at 1% level of significance, a) there is insufficient evidence to conclude that the difference between the average traveling time for the different starting locations depends on the types of signs. b) there is insufficient evidence to conclude that the difference between the average traveling time for the different types of signs depends on the starting locations. c) there is insufficient evidence to conclude that the relationship between traveling time and the types of signs depends on the starting locations. d) *All of the above. Explanation: All p-values are above .01, so we do not reject the ‘no difference’ null hypothesis. 4/19/03 252x0332 TABLE 11-2 An airline wants to select a computer software package for its reservation system. Four software packages (1, 2, 3, and 4) are commercially available. The airline will choose the package that bumps as few passengers, on the average, as possible during a month. An experiment is set up in which each package is used to make reservations for 5 randomly selected weeks. (A total of 20 weeks was included in the experiment.) The number of passengers bumped each week is obtained, which gives rise to the following Excel output: ANOVA Source of Variation 3. 4. 212.4 Within Groups 136.4 Total 348.8 df MS 3 F P-value F crit 8.304985 0.001474 3.238867 8.525 Referring to Table 11-2, the between group mean squares is a) *70.8 b) 212.4 c) 637.2 d) 8.525 Explanation: 8.525 is 136.4 divided by 3. Referring to Table 11-2, the within groups degrees of freedom is a) 3 b) 4 c) *16 d) 19 Explanation: 5. SS Between Groups 136 .4  8.525 16 Referring to Table 11-2, at a significance level of 1%, a) there is insufficient evidence to conclude that the average numbers of customers bumped by the 4 packages are not all the same. b) there is insufficient evidence to conclude that the average numbers of customers bumped by the 4 packages are all the same. c) *there is sufficient evidence to conclude that the average numbers of customers bumped by the 4 packages are not all the same. d) there is sufficient evidence to conclude that the average numbers of customers bumped by the 4 packages are all the same. Explanation: The null hypothesis of equal means is rejected – note the low p-value. 2 4/19/03 252x0332 6. The Journal of Business Venturing reported on the activities of entrepreneurs during the organization creation process. As part of a designed study, a total of 71 entrepreneurs were interviewed and divided into 3 groups: those that were successful in founding a new firm (n1 = 34), those still actively trying to establish a firm (n2 = 21), and those who tried to start a new firm but eventually gave up (n3 = 16). The total number of activities undertaken (e.g., developed a business plan, sought funding, looked for facilities) by each group over a specified time period during organization creation was measured. The objective is to compare the mean or median number of activities of the 3 groups of entrepreneurs. The underlying distribution is not known to be Normal, nor is it likely that the columns have similar variances. Identify the method that would be used to analyze the data.. a) Friedman Test for differences in medians. b) *Kruskal-Wallis Rank Test for Differences in Medians c) One-way ANOVA F test d) Two-way ANOVA Explanation: ANOVA is not appropriate because ANOVA requires a Normal parent population and that the samples come from populations with equal variances. The Friedman test requires that the data be cross – classified, but there is no evidence that it is, especially since the columns are of unequal length.. 7. The slope (b1) represents a) predicted value of Y when X = 0.0 b) *the estimated average change in Y per unit change in X. c) the predicted value of Y. d) variation around the line of regression. 8. The least squares method minimizes which of the following? a) SSR (Gets larger when SSE gets smaller.) b) *SSE c) SST ( This is basically the variance of our y data, we can’t really control it.) d) All of the above TABLE 13-2 A large mail order house weighs its mail upon arrival. And would like to be able to estimate the number of orders it contains. It has data on 25 shipments giving the weight of the mail(in pounds) (in column 1)and the number of (thousands of) orders ( in column 2). The data are not shown here but you may want to know that the largest weight on the list was 652 pounds and the largest number of orders was 20.2 (thousand). Since they didn’t know what they were doing, they did 2 regressions, but only one is correct. ————— 4/16/2003 1:44:21 AM ———————————————————— Welcome to Minitab, press F1 for help. MTB > Retrieve "C:\Documents and Settings\RBOVE\My Documents\Drive D\MINITAB\2x03318.MTW". Retrieving worksheet from file: C:\Documents and Settings\RBOVE\My Documents\Drive D\MINITAB\2x0331-8.MTW # Worksheet was saved on Wed Apr 16 2003 Results for: 2x0331-8.MTW MTB > regress c1 1 c2 3 4/19/03 252x0332 Regression Analysis: Weight versus Orders The regression equation is Weight = 5.6 + 32.8 Orders Predictor Constant Orders Coef 5.55 32.760 S = 24.10 SE Coef 15.78 1.137 R-Sq = 97.3% T 0.35 28.82 P 0.728 0.000 R-Sq(adj) = 97.2% Analysis of Variance Source Regression Residual Error Total DF 1 23 24 SS 482693 13363 496056 Unusual Observations Obs Orders Weight 4 7.5 203.00 9 9.2 365.00 MS 482693 581 Fit 251.25 306.94 F 830.82 SE Fit 8.09 6.64 P 0.000 Residual -48.25 58.06 St Resid -2.13R 2.51R R denotes an observation with a large standardized residual MTB > regress c2 1 c1 Regression Analysis: Orders versus Weight The regression equation is Orders = 0.191 + 0.0297 Weight Predictor Constant Weight Coef 0.1912 0.029703 S = 0.7258 SE Coef 0.4747 0.001030 R-Sq = 97.3% T 0.40 28.82 P 0.691 0.000 R-Sq(adj) = 97.2% Analysis of Variance Source Regression Residual Error Total DF 1 23 24 Unusual Observations Obs Weight Orders 9 365 9.200 SS 437.64 12.12 449.76 Fit 11.033 MS 437.64 0.53 F 830.82 SE Fit 0.164 P 0.000 Residual -1.833 St Resid -2.59R R denotes an observation with a large standardized residual 9. Referring to Table 13-2, what is the number of orders you would expect when the mail weighs 500 pounds? Solution: Orders = 0.191 + 0.0297 Weight = .191 + .0297(500)= 15.041 thousand orders. 10. Referring to Table 13-2, what percentage of the total variation in orders sold is explained by weight? Solution: R-Sq = 97.3% 4 4/19/03 252x0332  1 exceeds 0. 11. Referring to Table 13-2, interpret the p value for testing whether There is insufficient evidence (at the  = 0.10) to conclude that weight (X) is a useful linear predictor of orders received (Y). b) Weight (X) is a poor predictor of orders received (Y). c) For every 1 pond increase in weight, we expect the number of orders sold to increase by 0. d) *There is sufficient evidence (at the  = 0.05) to conclude that weight (X) is a useful linear predictor of orders received. (Y). a) 12. Referring to Table 13-2, give a 99% confidence interval for 1 and interpret the interval. Solution: DF  n  2  25  2  23 . This can also be read from the ANOVA as the error degrees of freedom. The formula for a confidence interval for the slope is   b  t nk 1 s . From the 1 printout b1  0.0297 and s b1 1  2 b1 23  2.807 , so  0.00103 . The t table says t .005 1  0.0297  2.807 0.00103   0.0297  .0029 , so we can say that there is a 99% probability that the 23 slope lies between 0.0268 and 0.0326. (If   .10 , use t .05  1.704 ) 13. A hospital does a test of goodness of fit to see if arrivals per hour follow a Poisson distribution with a mean of 2. The data are below. The f column has been copied from the Poisson table. The O and the E columns both add to 480. O Row O Fo Fe f E D n 1 2 3 4 5 6 7 8 9 10 65 130 125 96 37 11 0 0 0 16 64.961 129.922 129.922 86.615 43.308 17.323 5.774 1.650 0.412 0.116 0.13542 0.40625 0.66667 0.86667 0.94375 0.96667 0.96667 0.96667 0.96667 1.00000 0.13534 0.40601 0.67668 0.85712 0.94735 0.98344 0.99547 0.99890 0.99976 1.00000 0.0000817 0.0002440 0.0100103 0.0095427 0.0035980 0.0167703 0.0288003 0.0322373 0.0330963 0.0000040 0.135417 0.270833 0.260417 0.200000 0.077083 0.022917 0.000000 0.000000 0.000000 0.033333 0.135335 0.270671 0.270671 0.180447 0.090224 0.036089 0.012030 0.003437 0.000859 0.000241 a) What method is the hospital using to check goodness of fit? (1) b) What is the critical value it uses if   .10 ? (2) c) Does it accept the null hypothesis H 0 : Poisson2 ? Why? (1). Solution: a) This is the Kolmogorov – Smirnov (K-S)method for checking the null hypothesis that a given distribution (in this case Poisson2 ) fits data. b) According to the K-S table, the 10% value for 1.36 1.22 1.4884  .062 . c) Since the largest n  480 is CV    .05569 . If   .05 , use 480 480null hypothesis that the distribution n and it difference is 0.0330963, is less than CV, we must accept the is Poisson2 . 5 4/19/03 252x0332 14. Since the administrator mistrusts the results, the analysis is redone. The data are below. x O 0 1 2 3 4 5 6+ 65 130 125 96 37 11 16 E O E  O2 -0.03920 -0.07792 4.92208 -9.38544 6.30752 6.32272 -8.04784 0.0015 0.0061 24.2269 88.0865 39.7848 39.9768 64.7677 E 64.961 129.922 129.922 86.615 43.308 17.323 7.952 E  O  2 E 0.00002 0.00005 0.18647 1.01699 0.91866 2.30777 8.14467 O2 E 65.039 130.078 120.264 106.402 31.611 6.985 32.193 a) What is the value of the test statistic this time?(2). b) What is the table value against which we test the test statistic? (1) c) Do we accept or reject the null hypothesis this time? Why? (1) d) Why are there three fewer rows this time?(1) e)The first method is supposedly more powerful than the second method. Do these results illustrate this fact? Why?(1) Solution: a)  2   E  O2 E   O2  n  12.5727 or 12.5746. n  480  b) This is a Chi E -squared test and the degrees of freedom are the number of rows minus 1, which gives us 6, so  .2106   10 .6446 . c) Since the table value is below our computed chi-squared, we reject the null hypothesis. d) We have to merge all rows with E below 5, e) In this large-sample case, it is not more powerful. A more powerful method is more likely to reject the null hypothesis when it is false, the fact that chi-squared rejected, when K-S did not leads us to believed that chi-squared is more powerful. 15. Turn in your computer problems 2 and 3 marked as requested in the Take-home. (5 points, 2 point penalty for not doing.) 6 4/16/03 252y0332 ECO252 QBA2 Third EXAM April 21, 22 2003 TAKE HOME SECTION Name: _________________________ Social Security Number: _________________________ Please Note: Computer problems 2 and 3 should be turned in with the exam. In problem 2, the 2 way ANOVA table should be completed. The three F tests should be done with a 5% significance level and you should note whether there was (i) a significant difference between drivers, (ii) a significant difference between cars and (iii) significant interaction. In problem 3, you should show on your third graph where the regression line is. II. Do the following: (22+ points) assume a 5% significance level. Show your work! 1. The Lees, in their book on statistics for Finance majors, ask about the relationship of gasoline prices  y  to crude oil prices x  and present the following data for the years 1979 - 1988. (To get you started the sum of the crude price column is 211.16 and the sum of the numbers squared in the crude price column is 4936.3. Obs No 1 2 3 4 5 6 7 8 9 10 Gas Price Crude Price (cents/gal)(dollars/barrel) 86 119 133 122 116 113 112 86 90 90 12.64 21.59 31.77 28.52 26.19 25.88 24.09 12.51 15.40 12.57 Just to make things interesting, change the tenth number in the Gas Price column by adding the 3 rd digit of your Social Security number to it. For example, Seymour Butz’s SS number is 123456789 and he will change 90 to 93. This should not change the results by much. Show your work – it is legitimate to check your results by running the problem on the computer, but I expect to see hand computations for every part of this problem. a. Compute the regression equation Y  b0  b1 x to predict the price of gasoline on the basis of crude oil prices. (3) b. Compute R 2 . (2) c. Compute s e . (2) d. Compute s b1 and do a significance test on b1 (2) e. In 1978, the price of crude oil was 63 cents a gallon. (Note posted correction – Price was $9.00 per barrel) Using this create a prediction interval for the price of gasoline for that year. Explain why a confidence interval for the price is inappropriate. (3) Solution: Working with the original data, we get the following table. Important: The solution that follows is for the original data. All other solutions are sketched in 252y033app. Especially note the first version if you added a decimal point to the gas price. Make sure that you were graded fairly. Much of the solution was unchanged if you moved the decimal point. 7 4/16/03 252y0332 Row gaspr i y x x2 xy 86 119 133 122 116 113 112 86 90 90 1067 12.64 21.59 31.77 28.52 26.19 25.88 24.09 12.51 15.40 12.57 211.16 159.77 466.13 1009.33 813.39 685.92 669.77 580.33 156.50 237.16 158.00 4936.30 1087.04 2569.21 4225.41 3479.44 3038.04 2924.44 2698.08 1075.86 1386.00 1131.30 23614.82 1 2 3 4 5 6 7 8 9 10 crprice y2  x  211 .16 ,  y  1067 ,  x n  10 , 2 7396 14161 17689 14884 13456 12769 12544 7396 8100 8100 116495  4936 .30 ,  xy  21614 .82 and  y Spare Parts Computation: SSx  x  x  211 .16  21.116 x  477 .4454  y  1067  106 .7 y  1084 .048 n n a) b1  Sxy  10 10 SSy  2  116495 .  nx 2  4936.30  1021.1162  xy  nx y  23614 .82  1021.116 106 .7 y 2  ny  116495  10 106 .7 2 2  2646 .10  SST  xy  nxy  1084 .048  2.2705  x  nx 477 .4454 Sxy  SSx 2 2 2 b0  y  b1 x  106 .7  2.2705  21.116  58.7561 So Y  b  b x becomes Yˆ  58.7516  2.2705 x . 0 1 SSR 2461 .33  xy  nxy   2.2705 1084 .048   2461 .33 So R  SST   .9302 2646 .10  xy  nxy  Sxy  1084 .048     .9302 SSxSSy  x  nx  y  ny  477 .4454 2646 .10  b) SSR  b1 Sxy  b1 2 2 2 R 2 2 2 2 2 2 c) d) SSE  SST  SSR  2646 .10  2461 .33  184 .77 or s e2  or s e2  or s e2  SSy  b1 Sxy  n2  y 2  y 2 n2  ny 2  ny n2 2   x  b12 n2 s e  23 .09625  4.8059 SSE 184 .77   23 .09625 n2 8   xy  nxy  2646 .10   2.2705 1084 .048    23 .09625 1  R SST  1  R  y n2 s e2   ny 2  b1 2 2 or 2  nx 2  2   8  1  .9302 2646 .10   23.0872 8 2646 .10  2.2705 2 477 .4454  8  23 .0985 So ( s e2 is always positive!) 8 4/16/03 252y0332  d) s b21  s e2     23 .09625   0.048375 sb1  0.048375  0.21994 . X 2  nX 2  477 .4454  b  10  H 0 :  1   10 The outline says that to test  use t  1 . Remember  10 is most often zero – and if H :    s b1 10  1 1 1  the null hypothesis is false in that case we say that 1 is significant. So t  b1  10 2.2705 =10.3233.  sb1 0.21994 Make a diagram of an almost Normal curve with zero in the middle and, if   .05 , the rejection zones are above t n2  t 8  2.306 and below  t n2  t 8  2.306 . Since our computed t-ratio is, at   .025 2 .025 2 10.3233, well into the upper reject region, we reject the null hypothesis that the coefficient is not significantly different from zero and say that b1 is significant. Note that the F test shown next shows the same information. In fact, the two F s in the table are just the squares of our t s. Source SS DF MS F F.05 Regression 2461.33 1 Error (Within) Total 184.77 2646.10 8 9 2461.33 106.57 F 1,8  5.32 s 23.09625 e) Our equation says that Yˆ  58.7516  2.2705 x , so, if x  9.00 , Yˆ  58.7516  2.27059.00  79.19 . the Prediction Interval is Y  Yˆ  t s , where 0 1 . sY2  s e2   n  X 0  X  0 Y 2    23 .09625  1  9  21 .116   1  32 .507 s  32 .507  5.702 ,  1 Y   10 477 .4454   X 2  nX 2   so that the prediction interval is Y0  Yˆ0  t sY  79.19  2.3065.702  79.19  13.15 . (The actual price of 86 cents was well within this range.) This is the only appropriate interval because the confidence interval for Y gives an average value for many years in which the oil price was $9.00, but it was only $9.00 in 1978 as far as we know. -------------------------------------------------------------Minitab results follow: Results for: 2x0331-9.MTW 2   Regression Analysis: y versus x The regression equation is y = 58.8 + 2.27 x Predictor Constant x S = 4.806 Coef 58.756 2.2705 SE Coef 4.887 0.2199 R-Sq = 93.0% T 12.02 10.32 P 0.000 0.000 R-Sq(adj) = 92.1% Analysis of Variance Source Regression Residual Error Total DF 1 8 9 SS 2461.3 184.8 2646.1 MS 2461.3 23.1 F 106.57 P 0.000 Unusual Observations Obs x y Fit SE Fit Residual 2 21.6 119.00 107.78 1.52 11.22 R denotes an observation with a large standardized residual St Resid 2.46R 9 4/16/03 252y0332 2. According to the Lees, the daily rate of return for a stock in percent is summarized in the following table. Add the second to last digit of your Social Security number to the 50. For example, if Seymour Butz’s SS number is 123456789, he will change the 50 to 58 and the total to 203. x interval F0 Fe fe z interval O O E D n below -3 -3 to -2 -2 to -1 -1 to 0 0 to 1 1 to 2 above 2 20 25 30 50 40 25 5 195 From the data we find that x  0 and s  1.6. On the basis of this test to see if the data follows a Normal distribution by a) a chi-squared test (5) and b) a Lilliefors test. (5) Hint: To find the probability of being on a given interval, you need values of z . You must use the sample mean and variance I gave you in place of  and  . Once you find the values of z you need, put the probability in the f e column. (You will have to round the values of z to numbers like 1.25 to use the Normal table – Round cliff-hangers like 1.875 to 1.87.) I showed you in class how to do the f e column using the Fe column, but in any case, for example, the item in the first row of the f e column is Px  3 and the second is P3  x  2 . If you have f e , you should be able to get E and do a chi-squared test, remembering that we lost degrees of freedom using the data to estimate the mean and variance. You will probably need to fill in the entire table to do the Lilliefors test. Explain why this has to be a Lilliefors test rather than a K-S test. Solution: a) I started by filling in the Table 1 below. As in the two Normal distribution examples covered in x x x0  class, I computed t  and called it z . For example P3  x  2 s 1.6  2  0 3 0  P z  P 1.87  z  1.25   P 1.87  z  0  P 1.25  z  0  .4693  .3944  .0749 1.6   1.6 This can go in the f e column to compute the expected number of items between -3 and -2. In order to speed up computations, what I actually did was to compute the Fe column first. For example I got  2  0  F 2  Px  2  P  z   Pz  1.25   Pz  0  P 1.25  z  0  .5  .3944  .1056 , which 1.6   I needed as the second item in the Fe column. Then, as I demonstrated in class, P3  x  2  F 2  F 3  .1056  .0307  .0749 . So I now had the f e  .0749 . This is the probability or proportion of data in -3 to -2. Since there are 195 items, we want E  .0749 195   14 .60 items in the second row of the E column. Using the O and E columns, we can do a chi-squared test as in Table 2 below. Table 1: O x interval F0 Fe fe z interval O E D n below -3 below -1.87 20 .1026 .1026 .0307 .0307 5.99 .0719 -3 to -2 -1.87 to -1.25 25 .1282 .2308 .1056 .0749 14.60 .1252 -2 to -1 -1.25 to -0.62 30 .1538 .3846 .2676 .1620 31.59 .1170 -1 to 0 -0.62 to 0 50 .2564 .6410 .5000 .2324 45.32 .1410 0 to 1 0 to 0.62 40 .2051 .8461 .7324 .2324 45.32 .1137 1 to 2 0.62 to 1.25 25 .1282 .9743 .8944 .1620 31.59 .0799 above 2 above 1.25 5 .0256 1.0000 1.0000 .1056 20.59 0 195 1.0000 195.00 10 4/16/03 252y0332 Table 2: Row 1 2 3 4 5 6 7 O 20 25 30 50 40 25 5 195 E 5.99 14.60 31.59 45.32 45.32 31.59 20.59 195.00 E O E  O2 -14.01 -10.40 1.59 -4.68 5.32 6.59 15.59 0.00 196.280 108.160 2.528 21.902 28.302 43.428 243.048 E  O  2 E 32.7680 7.4082 0.0800 0.4833 0.6245 1.3747 11.8042 54.5429 O2 E 66.7780 42.8082 28.4900 55.1633 35.3045 19.7847 1.2142 249.543 So, we can use either the last column or the second-to-last column to tell us that  2  249.543 195  54 .543 . Because we have estimated 2 parameters from the data our degrees of freedom are 7 – 1 - 2 =4, and  .2054   9.4877 . Since the value of our computed test statistic at 54.543 is much larger than 9.4877, we must reject the null hypothesis that x is Normal. . b) This is much easier. The Lilliefors test is a single – purpose test to see if a distribution with unknown population mean and variance is Normal. We cannot use K-S because we need to specify all parameters in advance to use it. Go back to Table 1 and compute the D column by looking at the absolute values of differences between the F0 and Fe columns. You will find that the maximum D is .1410. According to our Lilliefors table, the 5% critical value is .886 n  .886  .0634 . Since our maximum D is larger than 195 this, reject the null hypothesis. 11 4/16/03 252y0332 3) (Extra credit) The Lees present the following data. Actually I should have said that the numbers represent student salaries, and the researcher wants to know if years of work experience make a difference. Years of Work Experience Region 1 2 3 1 16 19 24 2 21 20 21 3 18 21 22 4 13 20 25 To vary the results, change the 25 by adding 1/10 of the third digit of your SS number. For example, if Seymour Butz’s SS number is 123456789, he will change the 25 to 25.3. a) Do a 2-way ANOVA on these data and explain what hypotheses you test and what the conclusions are. (6) b) What other method could we use on these data to see if years of experience makes a difference while allowing for cross-classification? Under what circumstances would we use it? Try it and tell what it tests and what it shows. Solution: a) 2-way ANOVA (Blocked by region) ‘s’ indicates that the null hypothesis is rejected. Region Exper 1 Exper 2 Exper 3 sum count mean Sum of squares x i.. n i SS x1 x2 x3 x i. x i2. 1 16.0 19.0 24.0 59.0 3 19.67 1193 386.78 2 21.0 20.0 21.0 62.0 3 20.67 1282 427.11 3 18.0 21.0 22.0 61.0 3 20.33 1249 413.44 4 13.0 20.0 25.0 58.0 3 19.33 1194 373.78 Sum 68.0 +80.0 +92.0 =240.0 12 20.00 4918 1601.11 4 +4 +4 = 12 nj  17.0 1190 289 x j SS x 2j From the above x 20.0 +1602 +400  x  240 , n  12 , SST   x  x  240  20 . n 12  n x SSR   n x SSC  2 j j 2 i i. 20.0  x =4918 =1218 23.0 +2126 +529  x 2 ij 2 ij  4918 , x 2 i.  1601 .11 x 2 .j  1 2 1 8and  n x  4918  12 20 2  4918  4800  118 . 2  n x  41218   12 20 2  4872  4800  72 . This is SSB in a one way ANOVA. 2  n x  31601 .11  12 20 2  4803 .33  4800  3.33 2 ( SSW  SST  SSC  SSR  52.0 ) F.05 Source SS . DF MS . F. Rows (Regions) 3.33 3 1.11 0.156 Columns(Experience) 72.00 2 36.00 5.062 F 3,6  4.76 ns F 2,6  5.14 s H0 Row means equal Column means equal Within (Error) 42.67 6 7.112 Total 118.00 11 So the results characterized by years of experience (column means) are significantly different. 12 4/16/03 252y0332 b) In general if the parent distribution is Normal use ANOVA, if it's not Normal, use Friedman or Kruskal-Wallis. If the samples are independent random samples use 1-way ANOVA or Kruskal Wallis. If they are cross-classified, use Friedman or 2-way ANOVA. So the other method that allows for cross-classification is Friedman and we use it if the underlying distribution is not Normal. The null hypothesis is H 0 : Columns from same distribution or H 0 : 1   2   3 . We use a Friedman test because the data is cross-classified by region. This time we rank our data only within rows. There are c  3 columns and r  4 rows. Original Data Ranked Data Exper Exper Exper Exper Exper Exper 1 2 3 1 2 3 Region x1 x2 x3 r1 r2 r3 1 2 3 4 16 21 18 13 19 20 21 20 24 21 22 25 1 2.5 1 1. 5.5 2 1 3 2. 8 3 2.5 2 3. 10.5 SRi To check the ranking, note that the sum of the three rank sums is 5.5 + 8 + 10.5 = 24, and that the rcc  1 434 SRi    24 . sum of the rank sums should be 2 2  12  SRi2   3r c  1 Now compute the Friedman statistic  F2    rc c  1 i       12 5.52  82  10 .52   434   344   14 30.25  64  110 .25   48  51.125  48  3.125 .  If we check the Friedman Table for c  3 and r  4 , we find  F2  2 has a p-value of .431 and  F2  3.5 has a p-value of .273. Since our number lies between these we can conclude that if   .05 , the p-value is higher and we cannot reject the null hypothesis. 13

4/19/03 252y0332 ECO252 QBA2 Name

Related documents

Products

Support

4/19/03 252y0332 ECO252 QBA2 Name

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib