licensed under a . Your use of this Creative Commons Attribution-NonCommercial-ShareAlike License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this
material constitutes acceptance of that license and the conditions of use of materials on this site.
Copyright 2006, The Johns Hopkins University and Karl W. Broman. All rights reserved. Use of these materials
permitted only in accordance with license rights granted. Materials provided “AS IS”; no representations or
warranties provided. User assumes all responsibility for use, and all liability related thereto, and must independently
review all materials for accuracy and efficacy. May contain materials owned by others. User is responsible for
obtaining permissions for use from third parties as needed.
ANOVA assumptions
• Data in each group are a random sample from
some population.
• Observations within groups are independent.
• Samples are independent.
• Underlying populations normally distributed.
• Underlying populations have the same variance.
Diagnostics
• QQ plot within each group
• QQ plot of all residuals, yti − ȳt·
• Plot residuals, yti − ȳt·, against fitted values, ȳt·.
• Plot SD versus mean for each group.
• Plot the residuals against other factors.
(e.g., order of measurements, weight or age of mouse).
A
B6
1
2
4
5
6
7
8
10
11
12
13
14
15
17
18
19
24
25
26
Strain
Strain
Example
0
1000
2000
3000
4000
5000
A
B6
1
2
4
5
6
7
8
10
11
12
13
14
15
17
18
19
24
25
26
6000
2.5
IL10 response
3.0
3.5
log10 IL10 response
ANOVA Tables
Original scale / 1000:
source
SS
df
MS
between strains
33
20 1.69 1.70
within strains
124 125 0.99
total
157 145
log10 scale:
F P-value
0.042
source
SS
df
MS
between strains
3.35
within strains
9.29 125 0.074
total
F
P
20 0.167 2.25 0.0036
12.63 145
A
B6
1
2
4
5
6
7
8
10
11
12
13
14
15
17
18
19
24
25
26
Strain
Strain
Residuals
−1000
0
1000
2000
3000
residuals (IL10)
4000
5000
A
B6
1
2
4
5
6
7
8
10
11
12
13
14
15
17
18
19
24
25
26
−0.5
0.0
residuals (log10 IL10)
0.5
Within-group QQ-plots : IL10
Strain A
Strain B6
Strain 2
2500
1800
1500
1000
1600
2000
Sample Quantiles
Sample Quantiles
Sample Quantiles
2000
1500
1000
1400
1200
1000
800
600
500
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
−1.5
−1.0
Theoretical Quantiles
Strain 4
−0.5
0.0
0.5
1.0
1.5
−1.5
−1.0
−0.5
0.0
0.5
Theoretical Quantiles
Theoretical Quantiles
Strain 8
Strain 12
1.0
1.5
1.0
1.5
1.0
1.5
1.0
1.5
3500
5000
3000
4000
3000
2000
Sample Quantiles
Sample Quantiles
Sample Quantiles
3000
2500
2000
1500
−1.0
−0.5
0.0
0.5
1.0
1.5
1500
500
500
−1.5
2000
1000
1000
1000
2500
−1.5
−1.0
Theoretical Quantiles
−0.5
0.0
0.5
1.0
1.5
−1.5
−1.0
Theoretical Quantiles
−0.5
0.0
0.5
Theoretical Quantiles
Within-group QQ-plots : log10 IL10
Strain A
Strain B6
3.4
Strain 2
3.4
3.2
3.0
2.8
Sample Quantiles
Sample Quantiles
Sample Quantiles
3.3
3.2
3.2
3.1
3.0
2.9
3.1
3.0
2.9
2.8
2.8
2.7
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
−1.5
−0.5
0.0
0.5
1.0
1.5
−1.5
−1.0
−0.5
0.0
0.5
Theoretical Quantiles
Theoretical Quantiles
Theoretical Quantiles
Strain 4
Strain 8
Strain 12
3.6
3.4
3.4
3.2
Sample Quantiles
3.4
Sample Quantiles
Sample Quantiles
−1.0
3.2
3.0
3.2
3.0
2.8
2.8
3.0
−1.5
−1.0
−0.5
0.0
0.5
Theoretical Quantiles
1.0
1.5
−1.5
−1.0
−0.5
0.0
0.5
Theoretical Quantiles
1.0
1.5
−1.5
−1.0
−0.5
0.0
0.5
Theoretical Quantiles
QQ plots of all residuals
log10 IL10
IL10
−1000
0
1000
2000
3000
4000
5000
−0.5
0.0
Residuals
0.5
Residuals
4000
Sample Quantiles
Sample Quantiles
5000
3000
2000
1000
0
0.5
0.0
−0.5
−1000
−2
−1
0
1
2
−2
−1
Theoretical Quantiles
0
1
2
Theoretical Quantiles
Residuals vs fitted values
5000
4000
0.5
residuals (log10 IL10)
residuals (IL10)
3000
2000
1000
0
0.0
−0.5
−1000
500
1000
1500
fitted values (IL10)
2000
2.7
2.8
2.9
3.0
3.1
fitted values (log10 IL10)
3.2
3.3
SDs vs means
0.5
0.4
SD (log10 IL10)
SD (IL10)
1500
1000
0.3
0.2
500
0.1
500
1000
1500
2000
2.7
2.8
Mean (IL10)
2.9
3.0
3.1
3.2
3.3
Mean (log10 IL10)
Homogeneity of variances
One of the ANOVA assumptions was homogeneity of the group
variances. This can formally be tested with Bartlett’s test.
Assume we have k treatment groups.
nt
number of cases in treatment group t.
N
number of cases (overall).
Yti
response i in treatment group t.
Ȳt·
average response in treatment group t.
S2t
the sample variance in treatment group t.
Bartlett’s test
We want to test
H0 : σ12 = · · · = σk2
versus
Ha : H0 is false.
• Calculate the pooled sample variance:
P
P
(nt – 1) × S2t
(nt – 1) × S2t
2
tP
= t
S =
N–k
t (nt – 1)
• Calculate the test statistic
X 2 = (N – k) × log(S2 ) –
X
(nt – 1) × log(S2t)
t
• Calculate the following correction factor:
#
"
X 1
1
1
–P
C=1+
3(k – 1)
nt – 1
t (nt – 1)
t
If H0 is true, then
X 2/C ∼ χ2(df=k–1)
Example
• For the example data, there are 21 strains with between 5 and 10 observations
per strain.
• The pooled sample variance on original scale / 1000 is 0.99.
• The pooled sample variance on log10 scale is 0.074.
• The test statistics were 79.9 and 34.0.
• The correction factor ended up being 1.07.
• Thus we look at the values 79.9 / 1.07 = 74.8 and 34.0 / 1.07 = 31.8.
• Since there are 21 strains, we refer to the χ2(df = 20) distribution.
• We end up with P-values of 2.9 × 10–8 and 0.045.
The R function bartlett.test() can be used to do these calculations.
Hartley’s F-max test
In case that the number of observations are the same in every
treatment group, there is a quick and dirty alternative to Bartlett’s
test, called Hartley’s F-max test. For this test, simply compute
Fmax =
max(S2t)
min(S2t)
There is a look-up table with critical values for Fmax, using the number of treatment groups (k) and the degrees of freedom associated
with each of the group variances (nt – 1).
Number of treatment groups
Df
α
2
3
2
0.05
39
0.01
199
0.05
15.4 27.8 39.2 50.7
62
72.9
83.5
93.9
104
114
124
0.01
47.5 85
151
184
216
249
281
310
337
361
0.05
9.6
15.5 20.6 25.2
29.5
33.6
37.5
41.4
44.6
48
51.4
0.01
23.2 37
69
79
89
97
106
113
120
0.05
7.15 10.8 13.7 16.3
18.7
20.8
22.9
24.7
26.5
28.2
29.9
0.01
14.9 22
38
42
46
50
54
57
60
0.05
5.82 8.38 10.4 12.1
13.7
15
16.3
17.5
18.6
19.7
20.7
0.01
11.1 15.5 19.1 22
25
27
30
32
34
36
37
0.05
4.99 6.94 8.44 9.7
10.8
11.8
12.7
13.5
14.3
15.1
15.8
0.01
8.89 12.1 14.5 16.5
18.4
20
22
23
24
26
27
0.05
4.43 6
7.18 8.12
9.03
9.8
10.5
11.1
11.7
12.2
12.7
0.01
7.5
11.7 13.2
14.5
15.8
16.9
17.9
18.9
19.8
21
0.05
4.03 5.34 6.31 7.11
7.8
8.41
8.95
9.45
9.91
10.3
10.7
0.01
6.54 8.5
11.1
12.1
13.1
13.9
14.7
15.3
16
16.6
0.05
3.72 4.85 5.67 6.34
6.92
7.42
7.87
8.28
8.66
9.01
9.34
0.01
5.85 7.4
10.4
11.1
11.8
12.4
12.9
13.4
13.9
3
4
5
6
7
8
9
10
5
6
7
8
9
10
11
12
87.5 142
202
266
333
403
475
550
626
704
448
1036 1362 1705 2063 2432 2813 3204 3605
9.9
4
729
120
49
28
9.9
8.6
59
33
9.6
Another example
Rate of growth in fish eggs from different mothers
360
tth
340
320
300
280
1
2
3
4
5
6
7
8
mom
8
7
6
mom
5
4
3
2
1
280
300
320
340
tth
360
ANOVA Table
source
SS
df
between moms
12757
within moms
73510 546
total
86267 553
MS
F P-value
7 1822 13.5
4e-16
135
QQ plot of all residuals
40
Residuals
20
0
−20
−40
−3
−2
−1
0
Normal quantiles
1
2
3
QQ plots within each group
Mom 2
340
350
330
340
330
350
340
340
330
320
310
300
320
Mom 4
Residuals
360
Mom 3
Residuals
370
350
Residuals
Residuals
Mom 1
320
−2
−1
0
1
2
−2
−1
0
1
290
2
−2
−1
0
1
2
−2
−1
0
1
Normal quantiles
Normal quantiles
Normal quantiles
Normal quantiles
Mom 5
Mom 6
Mom 7
Mom 8
350
330
330
320
2
360
335
330
300
325
320
290
Residuals
310
Residuals
340
Residuals
Residuals
310
300
300
280
300
320
310
290
310
330
320
310
330
320
310
300
315
300
290
−2
−1
0
1
Normal quantiles
2
−1.5
−0.5
0.5
Normal quantiles
1.0
1.5
−2
−1
0
1
Normal quantiles
2
−2
−1
0
1
Normal quantiles
2