STATISTICS
Analysis Of Variance
Review
Preview
ANOVA
 F test
 One-way ANOVA
 Multiple comparison
 Two-way ANOVA
1
Review
STATISTICS
Standard normal distribution
Z value:
 (Observed - Expected) in terms of UNITS of SD
x ~ N (  , 2 )
Z
x

x
2
Review
STATISTICS
Central Limit Theorem
For large n,
x ~ N (  ,  / n)
2
 The beauty of CLT:
 Easy to calculate V
 The ugliness of CLT:
 Hard to explain p
X N ( ,
x~
)
2
3
Review
STATISTICS
Sampling Distribution of
x1 ~ N (1, / n1 )
2
1
( x1  x2 )
x2 ~ N(2 , 22 / n2 )
1
2
x 1  x 2 ~ N ( 1   2 ,
( x x
1
 12
n1

 22
n2
)
 (2x  x
2)
1
1   2 x1  x 2
2)
4
Review
STATISTICS
Population & Sampling Distribution
Population parameters known
SD
Mean
x
x
 i
N
x
xi
x  
N

 (x
Z score
i
 )2
z
N
SEx 
Population parameters unknown

n
Zx 
X 

( xi   )

n
 Please add yourself: ( x1  x2 )
Mean
x
x i
n
xi
x  
n
SD
S
 (x
SEx 
t score
i
 x) 2
n 1
S
n
t
tx 
xx
S
( xi   )
S
n
5
Review
Flowchart of 2G MD test
STATISTICS
1-s t
1 g ro u p
N > 30
1-s t
ND
1. Tra n s F fo r t
2. s ig n te s t
No o f g ro u p s
2-s t
2-s t
N > 30
Eq u a l va ria n c e
2-s t
Eq u a l N
ND
In d e p e n d e n t
2 g ro u p
1. tra n s fo rm fo r t
2. WRS te s t
1. Tra n s F fo r t
2. WRS te s t
P a ire d t
N > 30
P a ire d t
ND
If Ye s , g o u p ; If No , d o d o wn
1. tra n s fo rm fo r t
2. WS R te s t
STATISTICS
ANOVA
Analysis of Variance
7
ANOVA
STATISTICS
Analysis of Variance
The logic of ANOVA
 Partition of sum of squares
F test
One way ANOVA
 Multiple comparison
Two way ANOVA
 Interaction and confounding
8
STATISTICS
ANOVA
Eyeball test for 3-sample means
1
2
3
1
2
3
 Using 95% Confidence Limits
 A: Non-Significant
 B: Significant
 Why?
 Between group variation
 Within group variation
 Why not do 2-s test 3 times?
 Alpha error inflated
 Ex: 7 groups MD comparisons
 1 / 21 < 0.05 !!
A
B
9
STATISTICS
ANOVA
Data sheet: k groups MD comparison
Subjects
Observed
1
X1
2
X2
3
X3
4
X4
5
X5
…
…
…
…
n
Xn
Tx
A
Group
Mean
Grand
Mean
Ma
Group
Effect
Ma-M
M
Total
Tx error
Difference
X1-Ma
X1-M
X2-Ma
X2-M
X3-Ma
X3-M
Mb-M
B
Mb
…
…
…
K
Mk
Mk-M
10
STATISTICS
ANOVA
The Logic of one-way ANOVA
 Total Difference divided into two parts
 (Observed-group mean)+(group mean-grand mean)
 
X ij  X  ( X ij  X . j )  ( X . j  X )
 Total sum of squares divided into two parts
 SS Total = SS Between + SS Within (or Error)  SST =
 
 ( X
j
i
ij
SSB + SSE
 X ) 2   [( X ij  X . j )  ( X . j  X )] 2  ( X ij  X . j ) 2   ( X . j  X ) 2
j
i
 Partition of TD & TSS
 Model of one-way ANOVA
  X ij     j  eij
j
i
j
i
x
x
x
11
A
B
C
STATISTICS
Assumptions in ANOVA
Normal Distribution: Y values in each group
 Not very important, esp. for large n
 If not ND and small n: Kruskal-Wallis nonparametric
Equal variance: homogeneity
 If not: data transformation or ask for help
Random & independent sample
12
ANOVA
STATISTICS
F test: variance ratio test
Review:
 F test for equal variance in 2-s t test
F test: F=V1/V2
 The larger V is divided by the smaller V
 If two variances are about equal, the ratio is about 1
 The critical value of F distribution depends on DFs
ANOVA for mean difference, k groups
 Null hypothesis: 1= 2 = 3=…= k
 Variance Between / Variance within
 If F is about to 1, it’s meaningless for grouping
13
STATISTICS
ANOVA
F test : named after Fisher
 Characteristics
 a sickly, poor-eyesighted child
 The teacher used no paper/pencil
to teach him
 Very strong instinct on geometry
 Mathematicians take years to prove
his formulas
 Persistence
 Calculation of ANOVA tables takes
Fisher 8 months, 8h/D to finish!!
 Reference:

The lady tasting tea, Salsburg, 2001
 「統計,改變了世界」天下,2001
Sir Ronald Aylmer Fisher 1890-1962
14
ANOVA
STATISTICS
One-way ANOVA
15
STATISTICS
ANOVA
One-way ANOVA table
Source of variation
SS
DF
Mean SS
F ratio
Between k groups
SSB
k-1
MSB
MSB/MSE
Error(within groups)
SSE
n-k
MSE
Total
SST
n-1
F test:
MS B MBSS SSB/(k  1)
F


MS E MESS SSE /(n  k )
16
ANOVA
STATISTICS
Multiple Comparison
 Definition:
 Contrast btw 2 means: 1 2
 More than 2 means is OK: [(1  2 )/2] c
Compare the overall effect of the drug with that of placebo
 Contrast Coefficients: add to 0
 Orthogonal
 Two contrasts are orthogonal if they don’t use the same information
 Ex: (1 2) and (3 4), i.e. the questions asked are INDEPENDENT
 Types of MC: before or after ANOVA
 Priori(planned) comparisons
 post hoc(posteriori) comparisons
17
ANOVA
STATISTICS
Example 1: one-way ANOVA
 Research problem:
 Life events, depressive symptoms, and immune function. Irwin
M. Am J Psychiatry, 1987; 144:437-441
 Subjects: women whose husbands
 treated for lung Ca.
 died of lung Ca. in the preceding 1-6 Months
 were in good health
 X: grouping by scores for major life events
 Measurement: Social Readjustment Rating Scale score
 Y: immune system function
 NK cell activity: lytic units
18
STATISTICS
Printout
Box plot & Error bar plot
Error Bar Plot
Box Plot
60.0
100.00
54.4
48.9
CELL
75.00
43.3
37.8
50.00
32.2
26.7
25.00
21.1
15.6
10.0
0.00
1
2
3
1
2
3
GROUP
19
STATISTICS
Printout
ANOVA table
Analysis of Variance Table
Source Term
DF
Sum of Squares Mean Square F-Ratio
A: GROUP
2
4654.156
2327.078
S(A)
34
9479.396
278.8058
Total (Adjusted)
36
14133.55
Total
37
8.35
Prob
Power(Alpha=0.05)
0.001125* 0.947488
20
STATISTICS
Printout
Nonparametric ANOVA
Kruskal-Wallis One-Way ANOVA on Ranks Test Results
Method
Prob. Level Decision (0.05)
DF
Chi-Sq (H)
Not Corrected for Ties
2
11.16963
0.003754
Reject Ho
Corrected for Ties
2
11.17095
0.003752
Reject Ho
Group Detail
Group
Count
Sum of Ranks
Mean Rank
Z-Value
Median
1
13
351.00
27.00
3.3087
37
2
12
163.50
13.63
-2.0927
14.5
3
12
188.50
15.71
-1.2815
14.05
21
STATISTICS
ANOVA
MC: Priori comparisons
 t test for orthogonal comparisons
 t statistic: t 
xi  x j
2MS E / n
; not using SDp but MSE
 DF: (n1+n2j); n=n1=n2
 Adjusting  downward:  / (group number)
 Ex: 4 comparisons, =0.05/4=0.0125
 Bonferroni t procedure
 Applicable for both orthogonal & non-orthogonal
 t statistic:
Multiplier 2MSE / n
 Multiplier table: no. of comparisons & DF for MSE
 Able to find CI for mean difference
22
STATISTICS
ANOVA
MC: Posteriori comparisons
Tukey’s HSD (honestly significant difference)
MSE
 HSD= Multiplier
n
 Like Bonferroni, HSD multiplier table is needed (P176, table 7-7)
 Able to find CI for mean difference
278.82
 Ex: HSD  4.42 
 21.31
12
24.63
22.17
2.46
LOW
n=13
MOD
n=12
HIGH
n=12
23
STATISTICS
ANOVA
MC: Posteriori comparisons
Scheffé’s procedure
 S statistic: S 
( j  1) F ,df  MS E 
C 2j
nj
 j: No. of groups; C: contrast; (alpha, df1, df2)=(0.01, 2, 34)
 most versatile (not only pair-wise) & most conservative
 EX: Low (Moderate & High) combined; Low Moderate

C 2j 12 (1) 2
12 (0.5) 2 (0.5) 2
 

 0.125; 
 
 0.167
n j 12
12
12
n j 12 12
C 2j
S  (3 1)  5.31 278.82 0.167  22.24
 Note: MD btw L & H not significant
 Able to find CI for mean difference
24
STATISTICS
ANOVA
MC: Posteriori comparisons
Newman-Keuls procedure





MS E
NK statistic: m ultiplier
2 Steps
2 Steps
n
3 Steps
Multiplier table is needed
Less conservative than Tukey’s HSD
Unable to find CI for mean difference
Ex:2 steps NK  3.87  4.82  18.65 ; 3 steps NK  4.42 4.82  21.31
same as HSD
25
ANOVA
STATISTICS
MC: Posteriori comparisons
Dunnett’s procedure




2MS E
Dunnett’s statistic: m ultiplier
n
Only used in several Tx means with single CTL mean
Relatively low critical value
Ex: D  2.71  6.82  18.48
2 units lower than HSD value;
4 units lower than Scheffé value
26
ANOVA
STATISTICS
Other posteriori comparisons
 Duncan’s new multiple-range test
 Same principle as NK test; but with smaller multiplier
 Least significant difference, LSD
 Use t distribution corresponding to the No. of DF for MSE
  levels are inflated.
 Proposed by Fisher
 The above two procedures are NOT recommended by
statisticians for medical research.
27
ANOVA
STATISTICS
Summary of Multiple Comparisons
 Don’t care about the formulas
 Which procedure is better? depends on you!
 Pairwise comparisons:
 Tukey’s test: the first choice; Newman-Keuls test: second choice
 Several Txs with single CTL:
 Dunnett’s is the best
 Non-pairwise comparisons:
 Scheffé is the best
 When  larger than 0.05 is OK to you: e.x., drug screening
 LSD, Duncan’s new multiple-range test are O.K.
 The above two are not recommended by the authors
28
STATISTICS
Printout
Multiple comparisons
Newman-Keuls Multiple-Comparison Test
Group
Count
Mean
Different From Groups
2
12
15.60000
1
3
12
18.05833
1
1
13
40.23077
2, 3
Response: CELL; Term A: GROUP; DF=34; MSE=278.8058
Scheffe's Multiple-Comparison Test
Group
Count
Mean
Different From Groups
2
12
15.60000
1
3
12
18.05833
1
1
13
40.23077
2, 3
Critical Value=2.5596
29
ANOVA
STATISTICS
Two-way ANOVA
30
STATISTICS
ANOVA
The Logic of two-way ANOVA
SST divided into 3 or 4 parts
 SST = SSR + SSC + SSE
 SST = SSR + SSC + SS(RC) +SSE
Models of two-way ANOVA
 Without interaction:
X ij     i   j  eij
 With interaction:
X ij    i   j  (i  j )  eij
31
STATISTICS
ANOVA
Simpson’s Paradox: 陳小姐買帽子
第一天
第二天
第一櫃(大人)
第二櫃(小孩)
兩櫃一起
紅色
黑色
紅色
黑色
紅色
黑色
合適
9
17
3
1
12
18
不合適
1
3
17
9
18
12
Total
10
20
20
10
30
30
90%
85%
15%
10%
40%
60%
32
STATISTICS
ANOVA
Statistical Interaction & confounding
Interaction: 2 lines with different slope
Y |T ,C    1T   2C   3TC
C0
H1 : ˆ3  0
Confounding: 2 parallel lines
C1
C1
Y
H1 : ˆ1|c  ˆ1
How to test: ANOVA
C0
T0
T1
33
STATISTICS
ANOVA
Confounding factors
 Mixing effect of X2 with X1 & Y
 Definition:
Obesity
 Associated With the disease of interest
in the absence of exposure
 本身單獨與疾病有相關;本身是危險因子
 Associated With the exposure
Cholesterol
MI
 與危險因子有相關
 Not as a result of being exposed.
 干擾不能是中介變項:intervening variable
 Intervening variable: X1X2Y
 Example: S/S of diseases
34
ANOVA
STATISTICS
Interaction & confounding
Interaction:
 The effect of X1 varies with the level of X2
 A phenomenon you have to present
 Main effects of X1, X2: not meaningful anymore
 Ex: X1(Sex), X2(teaching method) & Y (language score)
Confounding:
 Given condition: no interaction
 A condition you have to control (or adjust)
35
STATISTICS
ANOVA
Two-way ANOVA table
Source of variation
SS
DF
Mean SS
F ratio
Among rows
SSR
r-1
MSR
MSR/MSE
Among columns
SSC
c-1
MSC
MSC/MSE
SS(RC)
(r-1)(c-1)
MS(RC)
MS(RC)/MSE
Error
SSE
rc(n-1)
MSE
Total
SST
n-1
Interaction
36
ANOVA
STATISTICS
Example 2: two-way ANOVA
Research problem:
 Glucose tolerance, insulin secretion, insulin
sensitivity and glucose effectiveness in normal and
overweight hyperthyroid women. Gonzalo MA. Clin
Endocrinol, 1996;45:689-697
 X1: BMI; X2: thyroid function
All categorical variables
BMI: 2 level; thyroid function: 2 level;
 Y: Insulin sensitivity
Continuous variable
37
STATISTICS
Printout
Box plot & Error bar plot, ex 2
Means of IS
1.00
Error Bar Plot
1.0
HT
HT
0.9
0
1
0 Normal thyroid
1 Hyperthyroid
0.8
0.75
IS
IS
0.7
0.50
0.6
0.4
0.3
0.25
0.2
0.1
0.00
0.0
0
1
BMI2
0
1
BMI
38
STATISTICS
Printout
Descriptive statistics, ex 2
Means and Standard Errors of IS
Term
All
Count
Mean
SE
33
0.4647917
0
19
0.615
5.786324E-02
1
14
0.3145833
6.740864E-02
0
19
0.57375
5.786324E-02
1
14
0.3558333
6.740864E-02
0,0
11
0.68
0.0760472
0,1
8
0.55
8.917324E-02
1,0
8
0.4675
8.917324E-02
1,1
6
0.1616667
0.1029684
A: BMI2
B: HT
AB: BMI2,HT
39
STATISTICS
Printout
2-way ANOVA table, ex 2
Analysis of Variance Table for IS (alpha = 0.05)
Source
DF
SS
MSS
F-Ratio
Prob.
Power
A: BMI2
1
0.7112253
0.7112253
11.18
0.002293*
0.898154
B: HT
1
0.3742312
0.3742312
5.88
0.021745*
0.649738
AB
1
6.091182E-02 6.091182E-02
0.96
0.335909
0.157220
S
29
1.844833
Total (Adj.)
32
2.916255
Total
33
6.361494E-02
40
STATISTICS
Summary
Flowchart of 3G MD test
1 Fa c to r
O n e -wa y ANOVA
No . o f Fa c to rs
3 or more groups
ND
2 o r m o re Fa c to rs
Two -w a y ANOVA
o r o th e r
In d e p e d e n t
Kru s ka l-Wa llis
fo r 1 Fa c to r
Re p e a te d
ANOVA
ND
If Ye s , g o u p ; If No , d o d o wn
Frie d m a n
41
STATISTICS
QUIZ
Q: Can I use ANOVA to test 2G MD?
A: Yes, you can.
Q: What is the relationship btw ANOVA & 2-s t?
A: 2-s t test is a special case of ANOVA
F, t & Z table:
2
(1). F  ,(1,n 1)  t1 / 2,( n 1)
2
2
(2).df2  ,  F ,(1,)  Z12 / 2
42
STATISTICS
Home Work
 Chapter 7, exercise 7, (table 7-20, p187)
 Analysis of phenotypic variation in psoriasis as a function of age at onset and family
history. Arch. Dermatol. Res. 2002;294:207-213
 Answering the following questions:




Is there a difference in %TBSA (percent of total body surface area affected) related to age at onset?
Is there a difference in %TBSA related to type of psoriasis (familial vs. sporadic)?
Is the interaction significant?
What is your conclusion?
43