252solngr4-071 4/3/07 (Open this document in 'Page Layout' view!) Name, Student Number: Class days and time: Please include this on what you hand in! Graded Assignment 4 The data set is part of a problem due to Pelosi and Sandifer. 20 Employees (A-T) are timed in a computer entry task initially (0hr), after 2 hours of work (2hr), after 4 hours (4hr) and after 6 hours (6hr). The times, in seconds are reported below. a) At a 5% significance level do the four mean times differ? b) determine which of the times actually differ. c) On the basis of these data, how would you react to a proposal that employees only be allowed to work for four hours a day at this task? Only neat and legible papers with written answers in complete sentences will be read! 0 Hours 67 64 69 88 72 80 85 116 77 78 68 51 54 75 71 64 86 98 103 91 2 Hours 84 78 74 91 70 73 86 71 76 76 61 62 94 63 70 63 66 71 53 81 4 Hours 52 53 56 66 59 77 64 62 54 65 71 92 71 50 71 58 77 53 81 70 6 Hours 57 53 71 61 73 50 53 80 63 41 63 41 53 63 61 46 68 64 49 70 Do this problem in Excel as follows. Use columns A, B, C, D, E and F on the Excel spreadsheet for data In the first row of Columns B, C, D and E put in 0hr, 2hr, 4hr and 6hr. Head column A with the word ‘employee.’ Starting in Cell A2 Put in the letters A through T to identify the employees – unless, of course, you want to suggest some names. Now put in the data in columns B, C, D and E, skipping column A If you bring this document into Word, the data can be moved into the Excel worksheet by highlighting the cells you want and copying and pasting. To fill column F in cell F2 write =B2 after your 'enter' this cell should read '67' Use the 'edit' pull-down menu and 'copy' cell F2 Use the 'edit' pull-down menu and 'paste' in cells F3 through F21. Now column F will be identical to B except for the heading. This can also be done as a simple copy and paste. Save your data as time1.xls Version A – One-way ANOVA Use the 'tools' pull-down menu and pick 'data analysis' (If you cannot find this, use Tools and Add-Ins to put in the analysis packs.) Pick 'ANOVA: Single Factor. Set input range to $B$1:$E$21. Select 'New worksheet ply' and 'columns' , check 'labels in first row' hit 'OK' and save your results as treslt1.xls. 252solngr4-071 4/3/07 Version B – Two-way ANOVA In order to check for the effect of the fact that the data is blocked by employees, repeat the analysis using ‘ANOVA: Two-Factor without replication. Set input range to $A$1:$E$21, check ‘labels,’ and save your results as treslt2.xls Answer the following: Is there a significant difference between the task completion times according to the number of hours worked? How is this conclusion affected by blocking by employees? Cite p-values and /or F-tests Version C – One way ANOVA Take the last digit of your student number (if it's zero, use 10). Go back to your original data or use the 'file' pull-down menu to open time1.xls. To fill column B this time in cell B2 write =F2+x, replacing x with the last digit of your social security number. Use the 'edit' pull down menu and 'copy' cell B2 Use the 'edit' pull down menu and 'paste' in cells B3 through B21. Now column B will be more than the original B by the amount of your value of x. Save your data as time3.xls. Run the one-way ANOVA again and save your results as treslt3.xls Submit the data and results with your Student number. The most effective way to do this is to paste the results into a Word document and then add neat hand or typed notes. Indicate what hypotheses were tested, what the p-value was and whether, using the p-value, you would reject the null if (i) the significance level was 5% and (ii) the significance level was 10%, explaining why. You will have two answers for each of your two problems. For your Version C do a Scheffe confidence interval and a Tukey-Kramer interval or procedure for each of the C 24 6 possible differences between means and report which are different at the 5% level according to each of the 2 methods. Now on the basis of these data, how would you react to a proposal that employees only be allowed to work for four hours a day at this task? Why? Extra Credit: 1) Show that you learned something from computer problem 2 by doing part B on Minitab. There should be very little difference in your result. The easiest way to do this is to copy the first five columns from the original Excel spreadsheet. Enter Minitab and use ‘editor’ to enable commands. Highlight the column labels and cells 1-20 of the first five columns. Remember that your column labels should be written in above the columns (Put row labels in column 1). Just to make sure that you are in the right place. Try the following Minitab commands. print c1-c5 AOVO c2-c5 You should get results equivalent to your first ANOVA. To set up for a 2-way ANOVA stack your data in columns 6 and 7. Stack c2 c3 c4 c5 c6; Subscripts c7 ; UseNames. To move the row labels, copy the A through T from column 1. Highlight all 80 cells of column 8 and paste your A-T into the column. Every number should now have a correct row label. Use the material from computer assignment 2 to check your data. I combined the ANOVA and the table of means command by using the following. Twoway c6 c8 c7; Means c8 c7. 2) Take the data from your last ANOVA. Use the instructions in 1) above to copy it into the Minitab spreadsheet and perform Levene and Bartlett tests on it using the third example in 252mvarex. as a pattern for your calculations using Minitab. Make sure that you explain what is being tested and what you conclude. There are two ways to do this. If you want to do it on the unstacked data use the following. Vartest c2-c5; Unstacked. 2 252solngr4-071 4/3/07 To do the tests on the stacked data use the following. Vartest c6 c7. Extra Extra Credit: Do Bartlett and Levene tests using the examples in 252mvar as your pattern. It turns out that your ANOVA has just enough columns to do this test. This is an awful lot of work unless you cheat and use the computer. If you cover your tracks, I’ll never know. To do the Bartlett test you need logarithms of variances. Label Columns 10-12 ‘stdev,’ ‘var’ and ‘log.’ Use the data that you already have in four columns in Minitab c2-c5 (labels in c1) and get the variances as follows: name k2 ‘stdv1’ name k3 ‘stdv2’ name k4 ‘stdv3’ name k5 ‘stdv4’ stdev c2 k2 stdev c3 k3 stdev c4 k4 stdev c5 k5 print k2-k5 stack k2-k5 c10 let c11 = c10 * c10 let c12 = logten(c11) let k11 = mean(c11) let k12 = logten(k11) print k11 – k12 print c10 – c12. #These are the standard deviations of the columns. #Now you have variances. #This is the pooled variance when you have equal sized samples. Now you are on your own. The rest of this should be pretty easy because all your n j s are equal. The Levene test is longer, but should be much more familiar and perhaps easier to fake. Copy columns 1 through 5 to c21-c25. Then find their medians and subtract them from the columns and convert the columns to absolute values. name k22 ‘med1’ name k23 ‘med2’ name k24 ‘med3’ name k25 ‘med4’ let k22 = median(c22) let k23 = median(c23) let k24 = median (c24) let k25 = median(c25) let c22 = c22 - k22 let c23 = c23 - k23 let c24 = c24 - k24 let c25 = c25 - k25 describe c22-c25 print c21 – c25 let c22 = absolute(c22) let c23 = absolute(c23) let c24 = absolute(c24) let c25 = absolute(c25) print c21 – c25 #All the columns should have zero medians now. #You are now ready for an ANOVA using: AOVO c22-c25 #You should get the same p-value as you got for the first Levene test # that you did. 3 252solngr4-071 4/3/07 Results Version A – One-way ANOVA Use the 'tools' pull-down menu and pick 'data analysis' (If you cannot find this, use Tools and Add-Ins to put in the analysis packs.) Pick 'ANOVA: Single Factor. Set input range to $B$1:$E$21. Select 'New worksheet ply' and 'columns' , check 'labels in first row' hit 'OK' and save your results as treslt1.xls. Data for 1st and 2nd ANOVA 0hr 2hr 3hr 4hr A 67 84 52 57 67 B 64 78 53 53 64 C 69 74 56 71 69 D 88 91 66 61 88 E 72 70 59 73 72 F 80 73 77 50 80 G 85 86 64 53 85 H 116 71 62 80 116 I 77 76 54 63 77 J 78 76 65 41 78 K 68 61 71 63 68 L 51 62 92 41 51 M 54 94 71 53 54 N 75 63 50 63 75 O 71 70 71 61 71 P 64 63 58 46 64 Q 86 66 77 68 86 R 98 71 53 64 98 S 103 53 81 49 103 T 91 81 70 70 91 Results for 1st ANOVA H 0 : 1 2 3 4 Anova: Single Factor SUMMARY Groups 0hr 2hr 3hr 4hr ANOVA Source of Variation Between Groups Within Groups Total Count 20 20 20 20 SS Sum 1557 1463 1302 1180 df 4211.05 11610.9 3 76 15821.95 79 Average 77.85 73.15 65.1 59 Variance 260.45 110.6605 125.5684 114.4211 MS F 1403.683 152.775 9.187913 P-value 2.93E05 F crit 2.724946 4 252solngr4-071 4/3/07 Version B – Two-way ANOVA In order to check for the effect of the fact that the data is blocked by employees, repeat the analysis using ‘ANOVA: Two-Factor without replication. Set input range to $A$1:$E$21, check ‘labels,’ and save your results as treslt2.xls Answer the following: Is there a significant difference between the task completion times according to the number of hours worked? How is this conclusion affected by blocking by employees? Cite p-values and /or F-tests. Results for 2 nd ANOVA H 01 : RowEmployeemeans equal H 02 : 1 2 3 4 Anova: Two-Factor Without Replication SUMMARY 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 Sum 260 248 270 306 274 280 288 329 270 260 263 246 272 251 273 231 297 286 286 312 Average 65 62 67.5 76.5 68.5 70 72 82.25 67.5 65 65.75 61.5 68 62.75 68.25 57.75 74.25 71.5 71.5 78 Variance 199.3333 140.6667 63 231 41.66667 186 263.3333 560.25 121.6667 288.6667 20.91667 487 368.6667 104.25 23.58333 68.25 84.25 367 643.6667 102 20 20 20 20 1557 1463 1302 1180 77.85 73.15 65.1 59 260.45 110.6605 125.5684 114.4211 ANOVA Source of Variation Rows SS 2726.45 df 19 MS 143.4974 F 0.920637 Columns Error 4211.05 8884.45 3 57 1403.683 155.8675 9.005617 A B C D E F G H I J K L M N O P Q R S T 0hr 2hr 3hr 4hr Count P-value 0.56152 5.65E05 F crit 1.771973 2.766441 Total 15821.95 79 Answer: In the first ANOVA we get a p-value of .0000293. Since this is below any significance level we are likely to use, we reject the null hypothesis that the mean execution time is the same for all numbers of hours worked. In the second ANOVA, the p-value for columns (.0000562) is almost as low, so we again reject the original null hypothesis. Note that the p-value for rows is 0.56152, which is above any 5 252solngr4-071 4/3/07 significance level we might care to use. The null hypothesis that column (employee) means are equal cannot be rejected, so we conclude that there is no significant difference between individuals. Version C – One way ANOVA Take the last digit of your student number (if it's zero, use 10). Go back to your original data or use the 'file' pull-down menu to open time1.xls. To fill column B this time in cell B2 write =F2+x, replacing x with the last digit of your social security number. Use the 'edit' pull down menu and 'copy' cell B2 Use the 'edit' pull down menu and 'paste' in cells B3 through B21. Now column B will be more than the original B by the amount of your value of x. Save your data as time3.xls. Run the one-way ANOVA again and save your results as treslt3.xls Data for 3rd ANOVA 0hr A B C D E F G H I J K L M N O P Q R S T I added 3 to first column instead of the second. 2hr 70 67 72 91 75 83 88 119 80 81 71 54 57 78 74 67 89 101 106 94 Results for 3rd ANOVA 3hr 84 78 74 91 70 73 86 71 76 76 61 62 94 63 70 63 66 71 53 81 4hr 52 53 56 66 59 77 64 62 54 65 71 92 71 50 71 58 77 53 81 70 57 53 71 61 73 50 53 80 63 41 63 41 53 63 61 46 68 64 49 70 67 64 69 88 72 80 85 116 77 78 68 51 54 75 71 64 86 98 103 91 H 0 : 1 2 3 4 Anova: Single Factor SUMMARY Groups Count 0hr 20 2hr 20 3hr 20 4hr 20 Sum 1617 1463 1302 1180 Average 80.85 73.15 65.1 59 Variance 260.45 110.6605 125.5684 114.4211 6 252solngr4-071 4/3/07 ANOVA Source of Variation SS Between Groups Within Groups df 5435.05 11610.9 3 76 MS F 1811.683 152.775 11.85851 P-value 1.88E06 F crit 2.724946 Total 17045.95 79 In this ANOVA we get a p-value of .00000188. Since this is below any significance level we are likely to use, we reject the null hypothesis that the mean execution time is the same for all numbers of hours worked. Conclusion Submit the data and results with your Student number. The most effective way to do this is to paste the results into a Word document and then add neat hand or typed notes. Indicate what hypotheses were tested, what the p-value was and whether, using the p-value, you would reject the null if (i) the significance level was 5% and (ii) the significance level was 10%, explaining why. You will have two answers for each of your two problems. For your Version C do a Scheffé confidence interval and a Tukey-Kramer interval or procedure for each of the C 24 6 possible differences between means and report which are different at the 5% level according to each of the 2 methods. Now on the basis of these data, how would you react to a proposal that employees only be allowed to work for four hours a day at this task? Why? Confidence Intervals from the Outline For completeness, I have included the individual confidence interval as well as the Tukey and Scheffé. Individual Confidence Interval If we desire a single interval, we use the formula for the difference between two means when the variance is known. For example, if we want the difference between means of column 1 and column 2. 1 1 , where s MSW . 1 2 x1 x2 tn m s 2 n1 n2 Scheffé Confidence Interval If we desire intervals that will simultaneously be valid for a given confidence level for all possible intervals 1 1 between column means, use 1 2 x1 x2 m 1Fm 1, n m s . n n2 1 Tukey Confidence Interval This also applies to all possible differences. 1 2 x1 x2 q m,n m s 2 1 1 . This gives rise to Tukey’s HSD (Honestly Significant n1 n 2 Difference) procedure. Two sample means x .1 and x .2 are significantly different if x.1 x.2 is greater than q m,n m s 2 1 1 n1 n 2 7 252solngr4-071 4/3/07 The Confidence Intervals from the data From the Excel output, x1 80.85, x2 73 .15, x3 65 .10, x4 59 .00, m 4, n m 76, n1 n 2 n3 n 4 20 and MSW 152 .775 . Assume 0.05 . The contrasts follow. 1 2 Individual: 1 2 80 .85 73 .15 t 76 152 .775 2 1 1 9.70 1.665 15 .2775 20 20 9.70 6.51 s 3F.053, 76 Scheffé: 1 2 80 .85 73 .15 9.70 3 2.73 1 1 20 20 1 1 9.70 125 .123 9.70 11 .18 20 20 152 .775 152 .775 Tukey: 1 2 x1 x2 q .405,76 2 80 .85 73 .15 3.73 152 .775 152 .775 2 ns 1 1 20 20 1 1 9.70 3.73 7.6387 9.70 10 .31 ns 20 20 1 3 Individual: 1 3 80 .85 65 .10 t 76 152 .775 2 1 1 15 .75 1.665 15 .2775 20 20 15.75 6.51 s 3F.053, 76 Scheffé: 1 3 80 .85 65 .10 15 .75 3 2.73 1 1 20 20 1 1 15 .75 125 .123 15 .75 11 .18 20 20 152 .775 152 .775 Tukey: 1 3 x1 x3 q .405,76 2 80 .85 65.10 3.73 152 .775 152 .775 2 s 1 1 20 20 1 1 15 .75 3.73 7.6387 15 .75 10 .31 s 20 20 1 4 Individual: 1 4 80 .85 59 .00 t 76 152 .775 2 1 1 21 .85 1.665 15 .2775 20 20 21.85 6.51 s Scheffé: 1 4 80 .85 59 .00 21 .85 3 2.73 3F.053, 76 152 .775 152 .775 2 1 1 20 20 1 1 21 .85 125 .123 21 .85 11 .18 20 20 152 .775 Tukey: 1 4 x1 x4 q .405,76 2 80 .85 59 .00 3.73 152 .775 s 1 1 20 20 1 1 21 .85 3.73 7.6387 21 .85 10 .31 s 20 20 8 252solngr4-071 4/3/07 2 3 Individual: 2 3 73 .10 65 .10 t 76 152 .775 2 1 1 15 .75 1.665 15 .2775 20 20 8.00 6.51 s 3F.053, 76 Scheffé: 2 3 73 .15 65 .10 8.00 3 2.73 1 1 20 20 1 1 8.00 125 .123 8.00 11 .18 20 20 152 .775 152 .775 Tukey: 2 3 x2 x3 q .405,76 2 73 .15 65.10 3.73 152 .775 152 .775 2 ns 1 1 20 20 1 1 8.00 3.73 7.6387 8.00 10 .31 ns 20 20 2 4 Individual: 2 4 73 .15 59 .00 t 76 152 .775 2 1 1 14 .15 1.665 15 .2775 20 20 14.15 6.51 s 3F.053, 76 Scheffé: 2 4 73 .15 59 .00 14 .15 3 2.73 152 .775 1 1 20 20 1 1 14 .15 125 .123 14 .15 11 .18 20 20 152 .775 Tukey: 2 4 x2 x4 q .405,76 2 73 .15 59 .00 3.73 152 .775 152 .775 2 s 1 1 20 20 1 1 14 .15 3.73 7.6387 14 .15 10.31 s 20 20 3 4 Individual: 3 4 65 .10 59 .00 t 76 152 .775 2 1 1 6.1 1.665 15 .2775 20 20 6.10 6.51 ns Scheffé: 3 4 65 .10 59 .00 6.10 3 2.73 152 .775 3F.053, 76 1 1 20 20 1 1 6.10 125 .123 6.10 11 .18 20 20 152 .775 Tukey: 3 4 x3 x4 q .405,76 2 65 .10 59 .00 3.73 152 .775 ns 1 1 20 20 152 .775 1 1 6.10 3.73 7.6387 6.10 10 .31 ns 20 20 2 Conclusion: I have included individual confidence levels here for completeness. The analysis of variance definitely tells us that the means are not the same, regardless of the significance level we might want to use, because the p-value is microscopic. If we compare the differences in sample means using either of the two methods requested, we find that there is no difference between the mean for subsequent periods, that is between 1 and 2, 2 and 3 etc, but there are differences between nonadjacent periods. The contrasts (intervals) are labeled ns for not significant and s for significant depending on whether the error part of the interval is larger or smaller than the difference between sample means. 9 252solngr4-071 4/3/07 These conclusions are at the 95% confidence level, but the more conservative Scheffé procedure 3, 76 2.73 1.65 (2.73 came from the computer printout reference value – using the table we might have come up with something like F 3,60 which is slightly larger) as part of the error term. If we used F.05 .05 were to repeat our tests at the 1% level, we could use something like 3, 60 F.01 4.13 2.03 , which would make our error terms 23% larger. If we were to do that, the differences between nonadjacent periods would still remain significant. Note that the mean entry time is falling as hours pass. The strong gains over longer periods might make it unwise to limit daily hours of employees. Extra Credit: 1) Show that you learned something from computer problem 2 by doing part B on Minitab. There should be very little difference in your result. Comments are in red. ————— 4/3/2007 5:28:57 PM ———————————————————— Welcome to Minitab, press F1 for help. Results for: 2gr4-071ANOVA.MTW MTB > print c1 - c5 Data Display Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Employee A B C D E F G H I J K L M N O P Q R S T 0hr 67 64 69 88 72 80 85 116 77 78 68 51 54 75 71 64 86 98 103 91 2hr 84 78 74 91 70 73 86 71 76 76 61 62 94 63 70 63 66 71 53 81 4hr 52 53 56 66 59 77 64 62 54 65 71 92 71 50 71 58 77 53 81 70 6hr 57 53 71 61 73 50 53 80 63 41 63 41 53 63 61 46 68 64 49 70 MTB > AOVO c2-c5 One-way ANOVA: 0hr, 2hr, 4hr, 6hr The low p-value means that the null hypothesis Source DF Factor 3 Error 76 Total 79 S = 12.36 SS MS F P of equal column means is rejected. 4211 1404 9.19 0.000 11611 153 15822 R-Sq = 26.62% R-Sq(adj) = 23.72% Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev ---+---------+---------+---------+-----0hr 20 77.85 16.14 (------*------) 2hr 20 73.15 10.52 (-----*------) 4hr 20 65.10 11.21 (------*------) 6hr 20 59.00 10.70 (------*------) ---+---------+---------+---------+-----56.0 64.0 72.0 80.0 Pooled StDev = 12.36 10 252solngr4-071 4/3/07 MTB > SUBC> SUBC> MTB > stack c2 c3 c4 c5 c6; subscripts c7; UseNames. Print c6 c7 c8 Data Display Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 Time 67 64 69 88 72 80 85 116 77 78 68 51 54 75 71 64 86 98 103 91 84 78 74 91 70 73 86 71 76 76 61 62 94 63 70 63 66 71 53 81 52 53 56 66 59 77 64 62 54 65 71 92 71 50 71 58 77 53 81 70 57 Hour 0hr 0hr 0hr 0hr 0hr 0hr 0hr 0hr 0hr 0hr 0hr 0hr 0hr 0hr 0hr 0hr 0hr 0hr 0hr 0hr 2hr 2hr 2hr 2hr 2hr 2hr 2hr 2hr 2hr 2hr 2hr 2hr 2hr 2hr 2hr 2hr 2hr 2hr 2hr 2hr 4hr 4hr 4hr 4hr 4hr 4hr 4hr 4hr 4hr 4hr 4hr 4hr 4hr 4hr 4hr 4hr 4hr 4hr 4hr 4hr 6hr Person A B C D E F G H I J K L M N O P Q R S T A B C D E F G H I J K L M N O P Q R S T A B C D E F G H I J K L M N O P Q R S T A This is just to show you what the stacked data looks like. 11 252solngr4-071 4/3/07 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 53 71 61 73 50 53 80 63 41 63 41 53 63 61 46 68 64 49 70 6hr 6hr 6hr 6hr 6hr 6hr 6hr 6hr 6hr 6hr 6hr 6hr 6hr 6hr 6hr 6hr 6hr 6hr 6hr B C D E F G H I J K L M N O P Q R S T MTB > table c8 c7. Tabulated statistics: Person, Hour Rows: Person Columns: Hour 0hr 2hr 4hr 6hr All A 1 1 B 1 1 C 1 1 D 1 1 E 1 1 F 1 1 G 1 1 H 1 1 I 1 1 J 1 1 K 1 1 L 1 1 M 1 1 N 1 1 O 1 1 P 1 1 Q 1 1 R 1 1 S 1 1 T 1 1 All 20 20 Cell Contents: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 20 20 Count This is an instruction from your 2-way ANOVA It tells you how much data is in each cell. 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 80 MTB > table c8 c7; SUBC> data c6. This is just a printout of data by cell. Because it was done by cell there were big blanks between each line. I edited them out. Tabulated statistics: Person, Hour Rows: Person 0hr 2hr A 67 84 B 64 78 C 69 74 D 88 91 E 72 70 F 80 73 G 85 86 H 116 71 I 77 76 J 78 76 K 68 61 L 51 62 M 54 94 Columns: Hour 4hr 6hr 52 57 53 53 56 71 66 61 59 73 77 50 64 53 62 80 54 63 65 41 71 63 92 41 71 53 12 252solngr4-071 4/3/07 N 75 63 O 71 70 P 64 63 Q 86 66 R 98 71 S 103 53 T 91 81 Cell Contents: 50 71 58 77 53 81 70 Time 63 61 46 68 64 49 70 : DATA MTB > twoway c6 c7 c8; SUBC> means c8 c7. Two-way ANOVA: Time versus Hour, Person Source DF Hour 3 Person 19 Error 57 Total 79 S = 12.48 So here is our 2-way ANOVA. The first hypothesis test says that the hypothesis that hour means are equal is rejected. The high p-value for the second test, which is above any significance level we might use tells us that there is no difference between employee means. SS MS F P 4211.1 1403.68 9.01 0.000 2726.5 143.50 0.92 0.562 8884.5 155.87 15822.0 R-Sq = 43.85% R-Sq(adj) = 22.17% Individual 95% CIs For Mean Based on Pooled StDev Hour Mean ---+---------+---------+---------+-----0hr 77.85 (------*------) 2hr 73.15 (------*------) 4hr 65.10 (------*------) 6hr 59.00 (------*------) ---+---------+---------+---------+-----56.0 64.0 72.0 80.0 Individual 95% CIs For Mean Based on Pooled StDev Person Mean +---------+---------+---------+--------A 65.00 (-------*--------) B 62.00 (-------*--------) C 67.50 (-------*-------) D 76.50 (-------*-------) E 68.50 (--------*-------) F 70.00 (--------*-------) G 72.00 (-------*-------) H 82.25 (--------*-------) I 67.50 (-------*-------) J 65.00 (-------*--------) K 65.75 (--------*-------) L 61.50 (-------*-------) M 68.00 (-------*--------) N 62.75 (--------*-------) O 68.25 (--------*-------) P 57.75 (--------*-------) Q 74.25 (--------*-------) R 71.50 (--------*-------) S 71.50 (--------*-------) T 78.00 (-------*-------) +---------+---------+---------+--------45 60 75 90 Extra Credit: 2) Take the data from your last ANOVA. Use the instructions in 1) above to copy it into the Minitab spreadsheet and perform Levene and Bartlett tests on it using the third example in 252mvarex. as a pattern for your calculations using Minitab. Make sure that you explain what is being tested and what you conclude. MTB > print c1-c5 Data Display Row 1 Employee A 0hr 67 This is just to remind you of the data. 2hr 84 4hr 52 6hr 57 13 252solngr4-071 4/3/07 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 B C D E F G H I J K L M N O P Q R S T 64 69 88 72 80 85 116 77 78 68 51 54 75 71 64 86 98 103 91 78 74 91 70 73 86 71 76 76 61 62 94 63 70 63 66 71 53 81 53 56 66 59 77 64 62 54 65 71 92 71 50 71 58 77 53 81 70 53 71 61 73 50 53 80 63 41 63 41 53 63 61 46 68 64 49 70 MTB > vartest c2-c5; SUBC> unstacked. This test was needlessly done twice. This is the unstacked version. Test for Equal Variances: 0hr, 2hr, 4hr, 6hr 95% Bonferroni confidence intervals for standard deviations N Lower StDev Upper 0hr 20 11.4383 16.1385 26.4296 2hr 20 7.4558 10.5195 17.2276 4hr 20 7.9422 11.2057 18.3514 6hr 20 7.5814 10.6968 17.5179 Bartlett's Test (normal distribution) Test statistic = 5.10, p-value = 0.165 Levene's Test (any continuous distribution) Test statistic = 1.29, p-value = 0.283 Test for Equal Variances: 0hr, 2hr, 4hr, 6hr Both p-values are above any significance level that we might use. This means that we cannot reject the null hypothesis of equal variances. Just a graphic of the info above. 14 252solngr4-071 4/3/07 MTB > vartest c6 c7 Look for the stacked data several pages back. This is exactly the same as the last test, Test for Equal Variances: Time versus Hour 95% Bonferroni confidence intervals for standard deviations Hour N Lower StDev Upper but done on stacked data. 0hr 20 11.4383 16.1385 26.4296 2hr 20 7.4558 10.5195 17.2276 4hr 20 7.9422 11.2057 18.3514 6hr 20 7.5814 10.6968 17.5179 Bartlett's Test (normal distribution) Test statistic = 5.10, p-value = 0.165 Levene's Test (any continuous distribution) Test statistic = 1.29, p-value = 0.283 Test for Equal Variances: Time versus Hour MTB > vartest c2-c5; SUBC> unstacked. Test for Equal Variances: 0hr, 2hr, 4hr, 6hr 95% Bonferroni confidence intervals for standard deviations N Lower StDev Upper 0hr 20 11.4383 16.1385 26.4296 2hr 20 7.4558 10.5195 17.2276 4hr 20 7.9422 11.2057 18.3514 6hr 20 7.5814 10.6968 17.5179 Bartlett's Test (normal distribution) Test statistic = 5.10, p-value = 0.165 Levene's Test (any continuous distribution) Test statistic = 1.29, p-value = 0.283 Test for Equal Variances: 0hr, 2hr, 4hr, 6hr This is exactly the same as the last graph. 15 252solngr4-071 4/3/07 Extra Extra Credit Do Bartlett and Levene tests using the examples in 252mvar as your pattern. It turns out that your ANOVA has just enough columns to do this test. This is an awful lot of work unless you cheat and use the computer. If you cover your tracks, I’ll never know. To do the Bartlett test you need logarithms of variances. Label Columns 10-12 ‘stdev,’ ‘var’ and ‘log.’ Use the data that you already have in four columns in Minitab c2-c5 (labels in c1) and get the variances as follows: MTB MTB MTB MTB > > > > name name name name k2 k3 k4 k5 'stdev1' 'stdev2' 'stdev3' 'stdev4' MTB > stdev c2 k2 Standard Deviation of 0hr Standard deviation of 0hr = 16.1385 We are computing standard deviations of the columns and storing them as the Minitab constants k2, k3, k4 and k5. We actually want variances. MTB > stdev c3 k3 Standard Deviation of 2hr Standard deviation of 2hr = 10.5195 MTB > stdev c4 k4 Standard Deviation of 4hr Standard deviation of 4hr = 11.2057 MTB > stdev c5 k5 Standard Deviation of 6hr Standard deviation of 6hr = 10.6968 MTB > print k2-k5 Data Display stdev1 stdev2 stdev3 stdev4 MTB MTB MTB MTB MTB MTB MTB MTB > > > > > > > > 16.1385 10.5195 11.2057 10.6968 stack k2-k5 c10 let c11 = c10*c10 let c12 = logten(c11) let k11 = mean(c11) let k12 = logten(k11) name k11 'meansdsq' name k12 'logmean' print k11 - k12 We put the standard deviations in C10 and squared them to get variances. This is the pooled variance when you have equal sized samples. Data Display meansdsq logmean 152.775 2.18405 MTB > print c10 - c12 Note that I named my columns. Data Display Row 1 2 3 4 stdev 16.1385 10.5195 11.2057 10.6968 sdsq 260.450 110.661 125.568 114.421 logsdsq 2.41572 2.04399 2.09888 2.05851 Now you are on your own. I’ll finish this if anyone actually does the Bartlett test. Extra Extra Credit Do Bartlett and Levene tests using the examples in 252mvar as your pattern. The Levene test is longer, but should be much more familiar and perhaps easier to fake. 16 252solngr4-071 4/3/07 Copy columns 1 through 5 to c21-c25. Then find their medians and subtract them from the columns and convert the columns to absolute values. MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB MTB > > > > > > > > > > > > > > > > > > name k22 'med1' name k23 'med2' name k24 'med3' name k25 'med4' let c21 = c1 let c22 = c2 let c23 = c3 let c24 = c4 let c25 = c5 let k22 = median(c22) let k23 = median(c23) let k24 = median(c24) let k25 = median(c25) let c22 = c22 - k22 let c23 = c23 - k23 let c24 = c24 - k24 let c25 = c25 - k25 describe c22 - c25 I copied my original data to c21-c25 I subtracted the median for each column. I checked to see if the medians were zero. Descriptive Statistics: 1-med, 2-med, 3-med, 4-med Variable 1-med 2-med 3-med 4-med N 20 20 20 20 N* 0 0 0 0 Variable 1-med 2-med 3-med 4-med Maximum 40.00 22.00 27.50 19.00 Mean 1.85 1.15 0.60 -2.00 SE Mean 3.61 2.35 2.51 2.39 MTB > print c22 - c25 Data Display Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 MTB MTB MTB MTB 1-med -9 -12 -7 12 -4 4 9 40 1 2 -8 -25 -22 -1 -5 -12 10 22 27 15 > > > > let let let let 2-med 12 6 2 19 -2 1 14 -1 4 4 -11 -10 22 -9 -2 -9 -6 -1 -19 9 c22 c23 c24 c25 = = = = 3-med -12.5 -11.5 -8.5 1.5 -5.5 12.5 -0.5 -2.5 -10.5 0.5 6.5 27.5 6.5 -14.5 6.5 -6.5 12.5 -11.5 16.5 5.5 StDev 16.14 10.52 11.21 10.70 Minimum -25.00 -19.00 -14.50 -20.00 Q1 -8.75 -8.25 -10.00 -10.25 Median 0.00 0.00 0.00 0.00 Q3 11.50 8.25 6.50 6.00 These are the original data with column medians subtracted. 4-med -4 -8 10 0 12 -11 -8 19 2 -20 2 -20 -8 2 0 -15 7 3 -12 9 abs(c22) abs(c23) abs(c24) abs(c25) 17 252solngr4-071 4/3/07 MTB > print c22 - c25 Data Display Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1-med 9 12 7 12 4 4 9 40 1 2 8 25 22 1 5 12 10 22 27 15 2-med 12 6 2 19 2 1 14 1 4 4 11 10 22 9 2 9 6 1 19 9 3-med 12.5 11.5 8.5 1.5 5.5 12.5 0.5 2.5 10.5 0.5 6.5 27.5 6.5 14.5 6.5 6.5 12.5 11.5 16.5 5.5 This is the absolute value of the columns we just printed. 4-med 4 8 10 0 12 11 8 19 2 20 2 20 8 2 0 15 7 3 12 9 MTB > AOVO c22 - c25 We now do an ordinary 1-way ANOVA One-way ANOVA: 1-med, 2-med, 3-med, 4-med Source DF Factor 3 Error 76 Total 79 S = 7.535 Level 1-med 2-med 3-med 4-med N 20 20 20 20 SS MS 220.1 73.4 4314.9 56.8 4535.0 R-Sq = 4.85% Mean 12.350 8.150 9.000 8.600 StDev 10.174 6.491 6.378 6.386 Pooled StDev = 7.535 F 1.29 P 0.283 Since the p-value is above any significance level that we might use, we cannot reject the null hypothesis of equal variances. R-Sq(adj) = 1.10% Individual 95% CIs For Mean Based on Pooled StDev ----+---------+---------+---------+----(----------*----------) (----------*----------) (----------*----------) (-----------*----------) ----+---------+---------+---------+----6.0 9.0 12.0 15.0 Game over. 18