252solngr4-071 4/3/07 Name, Student Number: Class days and time:

advertisement
252solngr4-071 4/3/07
(Open this document in 'Page Layout' view!)
Name, Student Number:
Class days and time:
Please include this on what you hand in!
Graded Assignment 4
The data set is part of a problem due to Pelosi and Sandifer.
20 Employees (A-T) are timed in a computer entry task initially (0hr), after 2 hours of work (2hr), after 4
hours (4hr) and after 6 hours (6hr). The times, in seconds are reported below. a) At a 5% significance level
do the four mean times differ? b) determine which of the times actually differ. c) On the basis of these data,
how would you react to a proposal that employees only be allowed to work for four hours a day at this task?
Only neat and legible papers with written answers in complete sentences will be read!
0 Hours
67
64
69
88
72
80
85
116
77
78
68
51
54
75
71
64
86
98
103
91
2 Hours
84
78
74
91
70
73
86
71
76
76
61
62
94
63
70
63
66
71
53
81
4 Hours
52
53
56
66
59
77
64
62
54
65
71
92
71
50
71
58
77
53
81
70
6 Hours
57
53
71
61
73
50
53
80
63
41
63
41
53
63
61
46
68
64
49
70
Do this problem in Excel as follows.
Use columns A, B, C, D, E and F on the Excel spreadsheet for data
In the first row of Columns B, C, D and E put in 0hr, 2hr, 4hr and 6hr. Head column A with the word
‘employee.’ Starting in Cell A2 Put in the letters A through T to identify the employees – unless, of course,
you want to suggest some names.
Now put in the data in columns B, C, D and E, skipping column A
If you bring this document into Word, the data can be moved into the Excel worksheet by highlighting the
cells you want and copying and pasting.
To fill column F in cell F2 write =B2 after your 'enter' this cell should read '67'
Use the 'edit' pull-down menu and 'copy' cell F2
Use the 'edit' pull-down menu and 'paste' in cells F3 through F21. Now column F will be identical to B
except for the heading. This can also be done as a simple copy and paste. Save your data as time1.xls
Version A – One-way ANOVA
Use the 'tools' pull-down menu and pick 'data analysis' (If you cannot find this, use Tools and Add-Ins to
put in the analysis packs.)
Pick 'ANOVA: Single Factor. Set input range to $B$1:$E$21. Select 'New worksheet ply' and 'columns' ,
check 'labels in first row' hit 'OK' and save your results as treslt1.xls.
252solngr4-071 4/3/07
Version B – Two-way ANOVA
In order to check for the effect of the fact that the data is blocked by employees, repeat the analysis using
‘ANOVA: Two-Factor without replication. Set input range to $A$1:$E$21, check ‘labels,’ and save your
results as treslt2.xls
Answer the following: Is there a significant difference between the task completion times according to the
number of hours worked? How is this conclusion affected by blocking by employees? Cite p-values and /or
F-tests
Version C – One way ANOVA
Take the last digit of your student number (if it's zero, use 10). Go back to your original data or use the 'file'
pull-down menu to open time1.xls.
To fill column B this time in cell B2 write =F2+x, replacing x with the last digit of your social security
number.
Use the 'edit' pull down menu and 'copy' cell B2
Use the 'edit' pull down menu and 'paste' in cells B3 through B21. Now column B will be more than the
original B by the amount of your value of x. Save your data as time3.xls.
Run the one-way ANOVA again and save your results as treslt3.xls
Submit the data and results with your Student number. The most effective way to do this is to paste the
results into a Word document and then add neat hand or typed notes. Indicate what hypotheses were tested,
what the p-value was and whether, using the p-value, you would reject the null if (i) the significance level
was 5% and (ii) the significance level was 10%, explaining why. You will have two answers for each of
your two problems.
For your Version C do a Scheffe confidence interval and a Tukey-Kramer interval or procedure for each of
the C 24  6 possible differences between means and report which are different at the 5% level according to
each of the 2 methods. Now on the basis of these data, how would you react to a proposal that employees
only be allowed to work for four hours a day at this task? Why?
Extra Credit: 1) Show that you learned something from computer problem 2 by doing part B on Minitab.
There should be very little difference in your result.
The easiest way to do this is to copy the first five columns from the original Excel spreadsheet. Enter
Minitab and use ‘editor’ to enable commands. Highlight the column labels and cells 1-20 of the first five
columns. Remember that your column labels should be written in above the columns (Put row labels in
column 1). Just to make sure that you are in the right place. Try the following Minitab commands.
print c1-c5
AOVO c2-c5
You should get results equivalent to your first ANOVA.
To set up for a 2-way ANOVA stack your data in columns 6 and 7.
Stack c2 c3 c4 c5 c6;
Subscripts c7 ;
UseNames.
To move the row labels, copy the A through T from column 1. Highlight all 80 cells of column 8 and paste
your A-T into the column. Every number should now have a correct row label. Use the material from
computer assignment 2 to check your data. I combined the ANOVA and the table of means command by
using the following.
Twoway c6 c8 c7;
Means c8 c7.
2) Take the data from your last ANOVA. Use the instructions in 1) above to copy it into the Minitab
spreadsheet and perform Levene and Bartlett tests on it using the third example in 252mvarex. as a pattern
for your calculations using Minitab. Make sure that you explain what is being tested and what you conclude.
There are two ways to do this. If you want to do it on the unstacked data use the following.
Vartest c2-c5;
Unstacked.
2
252solngr4-071 4/3/07
To do the tests on the stacked data use the following.
Vartest c6 c7.
Extra Extra Credit: Do Bartlett and Levene tests using the examples in 252mvar as your pattern. It turns
out that your ANOVA has just enough columns to do this test.
This is an awful lot of work unless you cheat and use the computer. If you cover your tracks, I’ll never
know. To do the Bartlett test you need logarithms of variances. Label Columns 10-12 ‘stdev,’ ‘var’ and
‘log.’ Use the data that you already have in four columns in Minitab c2-c5 (labels in c1) and get the
variances as follows:
name k2 ‘stdv1’
name k3 ‘stdv2’
name k4 ‘stdv3’
name k5 ‘stdv4’
stdev c2 k2
stdev c3 k3
stdev c4 k4
stdev c5 k5
print k2-k5
stack k2-k5 c10
let c11 = c10 * c10
let c12 = logten(c11)
let k11 = mean(c11)
let k12 = logten(k11)
print k11 – k12
print c10 – c12.
#These are the standard deviations of the columns.
#Now you have variances.
#This is the pooled variance when you have equal sized samples.
Now you are on your own. The rest of this should be pretty easy because all your n j s are equal.
The Levene test is longer, but should be much more familiar and perhaps easier to fake.
Copy columns 1 through 5 to c21-c25. Then find their medians and subtract them from the columns and
convert the columns to absolute values.
name k22 ‘med1’
name k23 ‘med2’
name k24 ‘med3’
name k25 ‘med4’
let k22 = median(c22)
let k23 = median(c23)
let k24 = median (c24)
let k25 = median(c25)
let c22 = c22 - k22
let c23 = c23 - k23
let c24 = c24 - k24
let c25 = c25 - k25
describe c22-c25
print c21 – c25
let c22 = absolute(c22)
let c23 = absolute(c23)
let c24 = absolute(c24)
let c25 = absolute(c25)
print c21 – c25
#All the columns should have zero medians now.
#You are now ready for an ANOVA using:
AOVO c22-c25
#You should get the same p-value as you got for the first Levene test
# that you did.
3
252solngr4-071 4/3/07
Results
Version A – One-way ANOVA
Use the 'tools' pull-down menu and pick 'data analysis' (If you cannot find this, use Tools and Add-Ins to
put in the analysis packs.)
Pick 'ANOVA: Single Factor. Set input range to $B$1:$E$21. Select 'New worksheet ply' and 'columns' ,
check 'labels in first row' hit 'OK' and save your results as treslt1.xls.
Data for 1st and 2nd ANOVA
0hr
2hr
3hr
4hr
A
67
84
52
57
67
B
64
78
53
53
64
C
69
74
56
71
69
D
88
91
66
61
88
E
72
70
59
73
72
F
80
73
77
50
80
G
85
86
64
53
85
H
116
71
62
80
116
I
77
76
54
63
77
J
78
76
65
41
78
K
68
61
71
63
68
L
51
62
92
41
51
M
54
94
71
53
54
N
75
63
50
63
75
O
71
70
71
61
71
P
64
63
58
46
64
Q
86
66
77
68
86
R
98
71
53
64
98
S
103
53
81
49
103
T
91
81
70
70
91
Results for 1st ANOVA
H 0 : 1   2   3   4
Anova: Single Factor
SUMMARY
Groups
0hr
2hr
3hr
4hr
ANOVA
Source of
Variation
Between Groups
Within Groups
Total
Count
20
20
20
20
SS
Sum
1557
1463
1302
1180
df
4211.05
11610.9
3
76
15821.95
79
Average
77.85
73.15
65.1
59
Variance
260.45
110.6605
125.5684
114.4211
MS
F
1403.683
152.775
9.187913
P-value
2.93E05
F crit
2.724946
4
252solngr4-071 4/3/07
Version B – Two-way ANOVA
In order to check for the effect of the fact that the data is blocked by employees, repeat the analysis using
‘ANOVA: Two-Factor without replication. Set input range to $A$1:$E$21, check ‘labels,’ and save your
results as treslt2.xls
Answer the following: Is there a significant difference between the task completion times according to the
number of hours worked? How is this conclusion affected by blocking by employees? Cite p-values and /or
F-tests.
Results for 2 nd ANOVA H 01 : RowEmployeemeans equal H 02 :  1   2   3   4
Anova: Two-Factor Without Replication
SUMMARY
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
Sum
260
248
270
306
274
280
288
329
270
260
263
246
272
251
273
231
297
286
286
312
Average
65
62
67.5
76.5
68.5
70
72
82.25
67.5
65
65.75
61.5
68
62.75
68.25
57.75
74.25
71.5
71.5
78
Variance
199.3333
140.6667
63
231
41.66667
186
263.3333
560.25
121.6667
288.6667
20.91667
487
368.6667
104.25
23.58333
68.25
84.25
367
643.6667
102
20
20
20
20
1557
1463
1302
1180
77.85
73.15
65.1
59
260.45
110.6605
125.5684
114.4211
ANOVA
Source of
Variation
Rows
SS
2726.45
df
19
MS
143.4974
F
0.920637
Columns
Error
4211.05
8884.45
3
57
1403.683
155.8675
9.005617
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
0hr
2hr
3hr
4hr
Count
P-value
0.56152
5.65E05
F crit
1.771973
2.766441
Total
15821.95
79
Answer: In the first ANOVA we get a p-value of .0000293. Since this is below any significance level we
are likely to use, we reject the null hypothesis that the mean execution time is the same for all numbers of
hours worked. In the second ANOVA, the p-value for columns (.0000562) is almost as low, so we again
reject the original null hypothesis. Note that the p-value for rows is 0.56152, which is above any
5
252solngr4-071 4/3/07
significance level we might care to use. The null hypothesis that column (employee) means are equal cannot
be rejected, so we conclude that there is no significant difference between individuals.
Version C – One way ANOVA
Take the last digit of your student number (if it's zero, use 10). Go back to your original data or use the 'file'
pull-down menu to open time1.xls.
To fill column B this time in cell B2 write =F2+x, replacing x with the last digit of your social security
number.
Use the 'edit' pull down menu and 'copy' cell B2
Use the 'edit' pull down menu and 'paste' in cells B3 through B21. Now column B will be more than the
original B by the amount of your value of x. Save your data as time3.xls.
Run the one-way ANOVA again and save your results as treslt3.xls
Data for 3rd ANOVA
0hr
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
I added 3 to first column instead of the second.
2hr
70
67
72
91
75
83
88
119
80
81
71
54
57
78
74
67
89
101
106
94
Results for 3rd ANOVA
3hr
84
78
74
91
70
73
86
71
76
76
61
62
94
63
70
63
66
71
53
81
4hr
52
53
56
66
59
77
64
62
54
65
71
92
71
50
71
58
77
53
81
70
57
53
71
61
73
50
53
80
63
41
63
41
53
63
61
46
68
64
49
70
67
64
69
88
72
80
85
116
77
78
68
51
54
75
71
64
86
98
103
91
H 0 : 1   2   3   4
Anova: Single Factor
SUMMARY
Groups
Count
0hr
20
2hr
20
3hr
20
4hr
20
Sum
1617
1463
1302
1180
Average
80.85
73.15
65.1
59
Variance
260.45
110.6605
125.5684
114.4211
6
252solngr4-071 4/3/07
ANOVA
Source of
Variation
SS
Between Groups
Within Groups
df
5435.05
11610.9
3
76
MS
F
1811.683
152.775
11.85851
P-value
1.88E06
F crit
2.724946
Total
17045.95
79
In this ANOVA we get a p-value of .00000188. Since this is below any significance level we are likely to
use, we reject the null hypothesis that the mean execution time is the same for all numbers of hours worked.
Conclusion
Submit the data and results with your Student number. The most effective way to do this is to paste the
results into a Word document and then add neat hand or typed notes. Indicate what hypotheses were tested,
what the p-value was and whether, using the p-value, you would reject the null if (i) the significance level
was 5% and (ii) the significance level was 10%, explaining why. You will have two answers for each of
your two problems.
For your Version C do a Scheffé confidence interval and a Tukey-Kramer interval or procedure for each of
the C 24  6 possible differences between means and report which are different at the 5% level according to
each of the 2 methods. Now on the basis of these data, how would you react to a proposal that employees
only be allowed to work for four hours a day at this task? Why?
Confidence Intervals from the Outline
For completeness, I have included the individual confidence interval as well as the Tukey and Scheffé.
Individual Confidence Interval
If we desire a single interval, we use the formula for the difference between two means when the variance is
known. For example, if we want the difference between means of column 1 and column 2.
1
1
, where s  MSW .
1   2  x1  x2   tn  m s

2
n1 n2
Scheffé Confidence Interval
If we desire intervals that will simultaneously be valid for a given confidence level for all possible intervals
 1
1 
between column means, use 1   2  x1  x2   m  1Fm 1, n  m   s
.

 n
n2 
1

Tukey Confidence Interval
This also applies to all possible differences.
1   2  x1  x2   q m,n  m 
s
2
1
1

. This gives rise to Tukey’s HSD (Honestly Significant
n1 n 2
Difference) procedure. Two sample means x .1 and x .2 are significantly different if x.1  x.2 is greater
than q m,n  m 
s
2
1
1

n1 n 2
7
252solngr4-071 4/3/07
The Confidence Intervals from the data
From the Excel output, x1  80.85, x2  73 .15, x3  65 .10, x4  59 .00, m  4, n  m  76,
n1  n 2  n3  n 4  20 and MSW  152 .775 . Assume   0.05 . The contrasts follow.
1   2
Individual: 1   2  80 .85  73 .15   t 76 152 .775
2
1
1

 9.70  1.665 15 .2775
20 20
 9.70  6.51 s
3F.053, 76
Scheffé: 1   2  80 .85  73 .15  
 9.70  
3 2.73
1
1

20 20
1
1

 9.70  125 .123  9.70  11 .18
20 20
152 .775
152 .775
Tukey: 1   2  x1  x2   q .405,76 
2
 80 .85  73 .15   3.73
152 .775
152 .775
2
ns
1
1

20 20
1
1

 9.70  3.73 7.6387  9.70  10 .31 ns
20 20
1   3
Individual: 1   3  80 .85  65 .10   t 76 152 .775
2
1
1

 15 .75  1.665 15 .2775
20 20
 15.75  6.51 s
3F.053, 76
Scheffé: 1   3  80 .85  65 .10  
 15 .75  
3 2.73
1
1

20 20
1
1
 15 .75  125 .123  15 .75  11 .18

20 20
152 .775
152 .775
Tukey: 1   3  x1  x3   q .405,76 
2
 80 .85  65.10   3.73
152 .775
152 .775
2
s
1
1

20 20
1
1

 15 .75  3.73 7.6387  15 .75  10 .31 s
20 20
1   4
Individual: 1   4  80 .85  59 .00   t 76 152 .775
2
1
1

 21 .85  1.665 15 .2775
20 20
 21.85  6.51 s
Scheffé: 1   4  80 .85  59 .00  
 21 .85  
3 2.73
3F.053, 76
152 .775
152 .775
2
1
1

20 20
1
1

 21 .85  125 .123  21 .85  11 .18
20 20
152 .775
Tukey: 1   4  x1  x4   q .405,76 
2
 80 .85  59 .00   3.73
152 .775
s
1
1

20 20
1
1

 21 .85  3.73 7.6387  21 .85  10 .31 s
20 20
8
252solngr4-071 4/3/07
 2  3
Individual:  2   3  73 .10  65 .10   t 76 152 .775
2
1
1

 15 .75  1.665 15 .2775
20 20
 8.00  6.51 s
3F.053, 76
Scheffé:  2   3  73 .15  65 .10  
 8.00  
3 2.73
1
1

20 20
1
1
 8.00  125 .123  8.00  11 .18

20 20
152 .775
152 .775
Tukey:  2   3  x2  x3   q .405,76 
2
 73 .15  65.10   3.73
152 .775
152 .775
2
ns
1
1

20 20
1
1

 8.00  3.73 7.6387  8.00  10 .31 ns
20 20
2  4
Individual:  2   4  73 .15  59 .00   t 76 152 .775
2
1
1

 14 .15  1.665 15 .2775
20 20
 14.15  6.51 s
3F.053, 76
Scheffé:  2   4  73 .15  59 .00  
 14 .15  
3 2.73
152 .775
1
1

20 20
1
1
 14 .15  125 .123  14 .15  11 .18

20 20
152 .775
Tukey:  2   4  x2  x4   q .405,76 
2
 73 .15  59 .00   3.73
152 .775
152 .775
2
s
1
1

20 20
1
1

 14 .15  3.73 7.6387  14 .15  10.31 s
20 20
3   4
Individual:  3   4  65 .10  59 .00   t 76 152 .775
2
1
1

 6.1  1.665 15 .2775
20 20
 6.10  6.51 ns
Scheffé:  3   4  65 .10  59 .00  
 6.10  
3 2.73
152 .775
3F.053, 76
1
1

20 20
1
1
 6.10  125 .123  6.10  11 .18

20 20
152 .775
Tukey:  3   4  x3  x4   q .405,76 
2
 65 .10  59 .00   3.73
152 .775
ns
1
1

20 20
152 .775
1
1

 6.10  3.73 7.6387  6.10  10 .31 ns
20 20
2
Conclusion: I have included individual confidence levels here for completeness. The analysis of variance
definitely tells us that the means are not the same, regardless of the significance level we might want to use,
because the p-value is microscopic. If we compare the differences in sample means using either of the two
methods requested, we find that there is no difference between the mean for subsequent periods, that is
between 1 and 2, 2 and 3 etc, but there are differences between nonadjacent periods. The contrasts
(intervals) are labeled ns for not significant and s for significant depending on whether the error part of
the interval is larger or smaller than the difference between sample means.
9
252solngr4-071 4/3/07
These conclusions are at the 95% confidence level, but the more conservative Scheffé procedure
3, 76
 2.73  1.65 (2.73 came from the computer printout reference value – using the table we
might have come up with something like F 3,60 which is slightly larger) as part of the error term. If we
used
F.05
.05
were to repeat our tests at the 1% level, we could use something like
3, 60
F.01
 4.13  2.03 , which
would make our error terms 23% larger. If we were to do that, the differences between nonadjacent periods
would still remain significant. Note that the mean entry time is falling as hours pass. The strong gains over
longer periods might make it unwise to limit daily hours of employees.
Extra Credit:
1) Show that you learned something from computer problem 2 by doing part B on Minitab. There should be
very little difference in your result. Comments are in red.
————— 4/3/2007 5:28:57 PM ————————————————————
Welcome to Minitab, press F1 for help.
Results for: 2gr4-071ANOVA.MTW
MTB > print c1 - c5
Data Display
Row
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Employee
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
0hr
67
64
69
88
72
80
85
116
77
78
68
51
54
75
71
64
86
98
103
91
2hr
84
78
74
91
70
73
86
71
76
76
61
62
94
63
70
63
66
71
53
81
4hr
52
53
56
66
59
77
64
62
54
65
71
92
71
50
71
58
77
53
81
70
6hr
57
53
71
61
73
50
53
80
63
41
63
41
53
63
61
46
68
64
49
70
MTB > AOVO c2-c5
One-way ANOVA: 0hr, 2hr, 4hr, 6hr
The low p-value means that the null hypothesis
Source DF
Factor
3
Error
76
Total
79
S = 12.36
SS
MS
F
P
of equal column means is rejected.
4211 1404 9.19 0.000
11611
153
15822
R-Sq = 26.62%
R-Sq(adj) = 23.72%
Individual 95% CIs For Mean Based on
Pooled StDev
Level
N
Mean StDev ---+---------+---------+---------+-----0hr
20 77.85 16.14
(------*------)
2hr
20 73.15 10.52
(-----*------)
4hr
20 65.10 11.21
(------*------)
6hr
20 59.00 10.70 (------*------)
---+---------+---------+---------+-----56.0
64.0
72.0
80.0
Pooled StDev = 12.36
10
252solngr4-071 4/3/07
MTB >
SUBC>
SUBC>
MTB >
stack c2 c3 c4 c5 c6;
subscripts c7;
UseNames.
Print c6 c7 c8
Data Display
Row
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
Time
67
64
69
88
72
80
85
116
77
78
68
51
54
75
71
64
86
98
103
91
84
78
74
91
70
73
86
71
76
76
61
62
94
63
70
63
66
71
53
81
52
53
56
66
59
77
64
62
54
65
71
92
71
50
71
58
77
53
81
70
57
Hour
0hr
0hr
0hr
0hr
0hr
0hr
0hr
0hr
0hr
0hr
0hr
0hr
0hr
0hr
0hr
0hr
0hr
0hr
0hr
0hr
2hr
2hr
2hr
2hr
2hr
2hr
2hr
2hr
2hr
2hr
2hr
2hr
2hr
2hr
2hr
2hr
2hr
2hr
2hr
2hr
4hr
4hr
4hr
4hr
4hr
4hr
4hr
4hr
4hr
4hr
4hr
4hr
4hr
4hr
4hr
4hr
4hr
4hr
4hr
4hr
6hr
Person
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
A
This is just to show you what the stacked data looks like.
11
252solngr4-071 4/3/07
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
53
71
61
73
50
53
80
63
41
63
41
53
63
61
46
68
64
49
70
6hr
6hr
6hr
6hr
6hr
6hr
6hr
6hr
6hr
6hr
6hr
6hr
6hr
6hr
6hr
6hr
6hr
6hr
6hr
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
MTB > table c8 c7.
Tabulated statistics: Person, Hour
Rows: Person
Columns: Hour
0hr 2hr 4hr 6hr All
A
1
1
B
1
1
C
1
1
D
1
1
E
1
1
F
1
1
G
1
1
H
1
1
I
1
1
J
1
1
K
1
1
L
1
1
M
1
1
N
1
1
O
1
1
P
1
1
Q
1
1
R
1
1
S
1
1
T
1
1
All
20
20
Cell Contents:
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
20
20
Count
This is an instruction from your 2-way ANOVA
It tells you how much data is in each cell.
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
80
MTB > table c8 c7;
SUBC> data c6.
This is just a printout of data by cell. Because it was done by
cell there were big blanks between each line. I edited them out.
Tabulated statistics: Person, Hour
Rows: Person
0hr 2hr
A
67
84
B
64
78
C
69
74
D
88
91
E
72
70
F
80
73
G
85
86
H
116
71
I
77
76
J
78
76
K
68
61
L
51
62
M
54
94
Columns: Hour
4hr 6hr
52
57
53
53
56
71
66
61
59
73
77
50
64
53
62
80
54
63
65
41
71
63
92
41
71
53
12
252solngr4-071 4/3/07
N
75
63
O
71
70
P
64
63
Q
86
66
R
98
71
S
103
53
T
91
81
Cell Contents:
50
71
58
77
53
81
70
Time
63
61
46
68
64
49
70
:
DATA
MTB > twoway c6 c7 c8;
SUBC> means c8 c7.
Two-way ANOVA: Time versus Hour, Person
Source DF
Hour
3
Person 19
Error
57
Total
79
S = 12.48
So here is our 2-way ANOVA. The first
hypothesis test says that the hypothesis
that hour means are equal is rejected.
The high p-value for the second test,
which is above any significance level we
might use tells us that there is no difference
between employee means.
SS
MS
F
P
4211.1 1403.68 9.01 0.000
2726.5
143.50 0.92 0.562
8884.5
155.87
15822.0
R-Sq = 43.85%
R-Sq(adj) = 22.17%
Individual 95% CIs For Mean Based on
Pooled StDev
Hour
Mean ---+---------+---------+---------+-----0hr
77.85
(------*------)
2hr
73.15
(------*------)
4hr
65.10
(------*------)
6hr
59.00 (------*------)
---+---------+---------+---------+-----56.0
64.0
72.0
80.0
Individual 95% CIs For Mean Based on
Pooled StDev
Person
Mean
+---------+---------+---------+--------A
65.00
(-------*--------)
B
62.00
(-------*--------)
C
67.50
(-------*-------)
D
76.50
(-------*-------)
E
68.50
(--------*-------)
F
70.00
(--------*-------)
G
72.00
(-------*-------)
H
82.25
(--------*-------)
I
67.50
(-------*-------)
J
65.00
(-------*--------)
K
65.75
(--------*-------)
L
61.50
(-------*-------)
M
68.00
(-------*--------)
N
62.75
(--------*-------)
O
68.25
(--------*-------)
P
57.75
(--------*-------)
Q
74.25
(--------*-------)
R
71.50
(--------*-------)
S
71.50
(--------*-------)
T
78.00
(-------*-------)
+---------+---------+---------+--------45
60
75
90
Extra Credit:
2) Take the data from your last ANOVA. Use the instructions in 1) above to copy it into the Minitab
spreadsheet and perform Levene and Bartlett tests on it using the third example in 252mvarex. as a pattern
for your calculations using Minitab. Make sure that you explain what is being tested and what you conclude.
MTB > print c1-c5
Data Display
Row
1
Employee
A
0hr
67
This is just to remind you of the data.
2hr
84
4hr
52
6hr
57
13
252solngr4-071 4/3/07
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
64
69
88
72
80
85
116
77
78
68
51
54
75
71
64
86
98
103
91
78
74
91
70
73
86
71
76
76
61
62
94
63
70
63
66
71
53
81
53
56
66
59
77
64
62
54
65
71
92
71
50
71
58
77
53
81
70
53
71
61
73
50
53
80
63
41
63
41
53
63
61
46
68
64
49
70
MTB > vartest c2-c5;
SUBC> unstacked.
This test was needlessly done twice. This is the
unstacked version.
Test for Equal Variances: 0hr, 2hr, 4hr, 6hr
95% Bonferroni confidence intervals for standard deviations
N
Lower
StDev
Upper
0hr 20 11.4383 16.1385 26.4296
2hr 20
7.4558 10.5195 17.2276
4hr 20
7.9422 11.2057 18.3514
6hr 20
7.5814 10.6968 17.5179
Bartlett's Test (normal distribution)
Test statistic = 5.10, p-value = 0.165
Levene's Test (any continuous distribution)
Test statistic = 1.29, p-value = 0.283
Test for Equal Variances: 0hr, 2hr, 4hr, 6hr
Both p-values are above any significance
level that we might use. This means that we
cannot reject the null hypothesis of equal
variances.
Just a graphic of the info above.
14
252solngr4-071 4/3/07
MTB > vartest c6 c7
Look for the stacked data several pages back.
This is exactly the same as the last test,
Test for Equal Variances: Time versus Hour
95% Bonferroni confidence intervals for standard deviations
Hour
N
Lower
StDev
Upper
but done on stacked data.
0hr 20 11.4383 16.1385 26.4296
2hr 20
7.4558 10.5195 17.2276
4hr 20
7.9422 11.2057 18.3514
6hr 20
7.5814 10.6968 17.5179
Bartlett's Test (normal distribution)
Test statistic = 5.10, p-value = 0.165
Levene's Test (any continuous distribution)
Test statistic = 1.29, p-value = 0.283
Test for Equal Variances: Time versus Hour
MTB > vartest c2-c5;
SUBC> unstacked.
Test for Equal Variances: 0hr, 2hr, 4hr, 6hr
95% Bonferroni confidence intervals for standard deviations
N
Lower
StDev
Upper
0hr 20 11.4383 16.1385 26.4296
2hr 20
7.4558 10.5195 17.2276
4hr 20
7.9422 11.2057 18.3514
6hr 20
7.5814 10.6968 17.5179
Bartlett's Test (normal distribution)
Test statistic = 5.10, p-value = 0.165
Levene's Test (any continuous distribution)
Test statistic = 1.29, p-value = 0.283
Test for Equal Variances: 0hr, 2hr, 4hr, 6hr
This is exactly the same as the last graph.
15
252solngr4-071 4/3/07
Extra Extra Credit
Do Bartlett and Levene tests using the examples in 252mvar as your pattern. It turns out that your ANOVA
has just enough columns to do this test.
This is an awful lot of work unless you cheat and use the computer. If you cover your tracks, I’ll never
know. To do the Bartlett test you need logarithms of variances. Label Columns 10-12 ‘stdev,’ ‘var’ and
‘log.’ Use the data that you already have in four columns in Minitab c2-c5 (labels in c1) and get the
variances as follows:
MTB
MTB
MTB
MTB
>
>
>
>
name
name
name
name
k2
k3
k4
k5
'stdev1'
'stdev2'
'stdev3'
'stdev4'
MTB > stdev c2 k2
Standard Deviation of 0hr
Standard deviation of 0hr = 16.1385
We are computing standard deviations of the
columns and storing them as the Minitab constants
k2, k3, k4 and k5. We actually want variances.
MTB > stdev c3 k3
Standard Deviation of 2hr
Standard deviation of 2hr = 10.5195
MTB > stdev c4 k4
Standard Deviation of 4hr
Standard deviation of 4hr = 11.2057
MTB > stdev c5 k5
Standard Deviation of 6hr
Standard deviation of 6hr = 10.6968
MTB > print k2-k5
Data Display
stdev1
stdev2
stdev3
stdev4
MTB
MTB
MTB
MTB
MTB
MTB
MTB
MTB
>
>
>
>
>
>
>
>
16.1385
10.5195
11.2057
10.6968
stack k2-k5 c10
let c11 = c10*c10
let c12 = logten(c11)
let k11 = mean(c11)
let k12 = logten(k11)
name k11 'meansdsq'
name k12 'logmean'
print k11 - k12
We put the standard deviations in C10 and squared
them to get variances.
This is the pooled variance when you have equal
sized samples.
Data Display
meansdsq
logmean
152.775
2.18405
MTB > print c10 - c12
Note that I named my columns.
Data Display
Row
1
2
3
4
stdev
16.1385
10.5195
11.2057
10.6968
sdsq
260.450
110.661
125.568
114.421
logsdsq
2.41572
2.04399
2.09888
2.05851
Now you are on your own. I’ll finish this if anyone
actually does the Bartlett test.
Extra Extra Credit
Do Bartlett and Levene tests using the examples in 252mvar as your pattern.
The Levene test is longer, but should be much more familiar and perhaps easier to fake.
16
252solngr4-071 4/3/07
Copy columns 1 through 5 to c21-c25. Then find their medians and subtract them from the columns and
convert the columns to absolute values.
MTB
MTB
MTB
MTB
MTB
MTB
MTB
MTB
MTB
MTB
MTB
MTB
MTB
MTB
MTB
MTB
MTB
MTB
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
name k22 'med1'
name k23 'med2'
name k24 'med3'
name k25 'med4'
let c21 = c1
let c22 = c2
let c23 = c3
let c24 = c4
let c25 = c5
let k22 = median(c22)
let k23 = median(c23)
let k24 = median(c24)
let k25 = median(c25)
let c22 = c22 - k22
let c23 = c23 - k23
let c24 = c24 - k24
let c25 = c25 - k25
describe c22 - c25
I copied my original data to c21-c25
I subtracted the median for each column.
I checked to see if the medians were zero.
Descriptive Statistics: 1-med, 2-med, 3-med, 4-med
Variable
1-med
2-med
3-med
4-med
N
20
20
20
20
N*
0
0
0
0
Variable
1-med
2-med
3-med
4-med
Maximum
40.00
22.00
27.50
19.00
Mean
1.85
1.15
0.60
-2.00
SE Mean
3.61
2.35
2.51
2.39
MTB > print c22 - c25
Data Display
Row
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
MTB
MTB
MTB
MTB
1-med
-9
-12
-7
12
-4
4
9
40
1
2
-8
-25
-22
-1
-5
-12
10
22
27
15
>
>
>
>
let
let
let
let
2-med
12
6
2
19
-2
1
14
-1
4
4
-11
-10
22
-9
-2
-9
-6
-1
-19
9
c22
c23
c24
c25
=
=
=
=
3-med
-12.5
-11.5
-8.5
1.5
-5.5
12.5
-0.5
-2.5
-10.5
0.5
6.5
27.5
6.5
-14.5
6.5
-6.5
12.5
-11.5
16.5
5.5
StDev
16.14
10.52
11.21
10.70
Minimum
-25.00
-19.00
-14.50
-20.00
Q1
-8.75
-8.25
-10.00
-10.25
Median
0.00
0.00
0.00
0.00
Q3
11.50
8.25
6.50
6.00
These are the original data with column medians subtracted.
4-med
-4
-8
10
0
12
-11
-8
19
2
-20
2
-20
-8
2
0
-15
7
3
-12
9
abs(c22)
abs(c23)
abs(c24)
abs(c25)
17
252solngr4-071 4/3/07
MTB > print c22 - c25
Data Display
Row
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
1-med
9
12
7
12
4
4
9
40
1
2
8
25
22
1
5
12
10
22
27
15
2-med
12
6
2
19
2
1
14
1
4
4
11
10
22
9
2
9
6
1
19
9
3-med
12.5
11.5
8.5
1.5
5.5
12.5
0.5
2.5
10.5
0.5
6.5
27.5
6.5
14.5
6.5
6.5
12.5
11.5
16.5
5.5
This is the absolute value of the columns we just printed.
4-med
4
8
10
0
12
11
8
19
2
20
2
20
8
2
0
15
7
3
12
9
MTB > AOVO c22 - c25
We now do an ordinary 1-way ANOVA
One-way ANOVA: 1-med, 2-med, 3-med, 4-med
Source DF
Factor
3
Error
76
Total
79
S = 7.535
Level
1-med
2-med
3-med
4-med
N
20
20
20
20
SS
MS
220.1 73.4
4314.9 56.8
4535.0
R-Sq = 4.85%
Mean
12.350
8.150
9.000
8.600
StDev
10.174
6.491
6.378
6.386
Pooled StDev = 7.535
F
1.29
P
0.283
Since the p-value is above any significance level
that we might use, we cannot reject the null
hypothesis of equal variances.
R-Sq(adj) = 1.10%
Individual 95% CIs For Mean Based on
Pooled StDev
----+---------+---------+---------+----(----------*----------)
(----------*----------)
(----------*----------)
(-----------*----------)
----+---------+---------+---------+----6.0
9.0
12.0
15.0
Game over.
18
Download