Chapter 8 9 Slides (PPT)

advertisement
Chapters 8-9
1-Way Analysis of Variance Completely Randomized Design
Comparing t > 2 Groups - Numeric Responses
• Extension of Methods used to Compare 2 Groups
• Independent Samples and Paired Data Designs
• Normal and non-normal data distributions
Data
Design
Normal
Nonnormal
Independent
Samples
(CRD)
Paired Data
(RBD)
F-Test
1-Way
ANOVA
F-Test
2-Way
ANOVA
KruskalWallis Test
Friedman’s
Test
Completely Randomized Design (CRD)
• Controlled Experiments - Subjects assigned at random
to one of the t treatments to be compared
• Observational Studies - Subjects are sampled from t
existing groups
• Statistical model yij is measurement from the jth subject
from group i:
yij     i   ij  i   ij
where  is the overall mean, i is the effect
of treatment i , ij is a random error, and i is
the population mean for group i
1-Way ANOVA for Normal Data (CRD)
• For each group obtain the mean, standard deviation, and
sample size:
y i. 
 yij
j
ni
si 
2
(
y

y
)
 ij i.
j
ni  1
• Obtain the overall mean and sample size
N  n1  ...  nt
n1 y1.  ...  nt y t .  i  j yij
y.. 

N
N
Analysis of Variance - Sums of Squares
• Total Variation
TSS   i 1  j 1 ( yij  y.. ) 2
t
ni
dfTotal  N  1
• Between Group (Sample) Variation
SSB   i 1  j 1 ( y i.  y.. )   i 1 ni ( y i.  y.. ) 2
t
ni
2
t
df B  t  1
• Within Group (Sample) Variation
SSW   i 1  j 1 ( yij  y i. )   i 1 (ni  1) si2
t
ni
TSS  SSB  SSW
2
t
dfTotal  df B  dfW
dfW  N  t
Analysis of Variance Table and F-Test
Source of
Variation
Treatments
Error
Total
Sum of Squares
SSB
SSW
TSS
Degrees of
Freedom
t-1
nT-t
nT -1
Mean Square
MSB=SSB/(t-1)
MSW=SSW/( nT -t)
F
F=MSB/MSW
• Assumption: All distributions normal with common variance
•H0: No differences among Group Means (1    t =0)
• HA: Group means are not all equal (Not all i are 0)
MSB
T .S . : Fobs 
MSW
R.R. : Fobs  F ,t 1, N t
P  val : P ( F  Fobs )
(Table 8)
Expected Mean Squares
• Model: yij =  +i + ij with ij ~ N(0,s2), Si = 0:
E ( MSW )  s 2
t
E ( MSB)  s 2 
2
n

ii
i 1
t 1
t
s 
2
E ( MSB )


E ( MSW )
2
n

ii
i 1
s
t 1  1
2
t
2
n

ii
i 1
2
s (t  1)
E ( MSB )
When H 0 :  1    t  0 is true,
1
E ( MSW )
E ( MSB )
otherwise (H a is true),
1
E ( MSW )
Expected Mean Squares
• 3 Factors effect magnitude of F-statistic (for fixed t)
– True group effects (1,…,t)
– Group sample sizes (n1,…,nt)
– Within group variance (s2)
• Fobs = MSB/MSW
• When H0 is true (1=…=t=0), E(MSB)/E(MSW)=1
• Marginal Effects of each factor (all other factors fixed)
– As spread in (1,…,t)  E(MSB)/E(MSW) 
– As (n1,…,nt)  E(MSB)/E(MSW)  (when H0 false)
– As s2  E(MSB)/E(MSW)  (when H0 false)
0.09
0.09
0.08
0.08
0.07
0.07
0.06
0.06
0.05
0.05
0.04
0.04
0.03
E ( MSB)
E ( MSW )
0.02
0.01
0
0
20
40
60
80
100
120
140
160
180
200
A) =100, 1=-20, 2=0, 3=20, s = 20
n
4
8
12
20
0.09
0.08
0.07
0.06
A
9
17
25
41
B
129
257
385
641
C
1.5
2
2.5
3.5
0.03
0.02
0.01
0
0
20
40
60
80
100
120
140
160
180
200
B) =100, 1=-20, 2=0, 3=20, s = 5
D
9
17
25
41
0.09
0.08
0.07
0.05
0.06
0.04
0.05
0.03
0.04
0.02
0.03
0.01
0.02
0
0.01
0
20
40
60
80
100
120
140
160
180
0
0
C) =100, 1=-5, 2=0, 3=5, s = 20
20
40
60
80
100
120
140
160
180
D) =100, 1=-5, 2=0, 3=5, s = 5
200
Example - Seasonal Diet Patterns in Ravens
• “Treatments” - t = 4 seasons of year (3 “replicates” each)
–
–
–
–
Winter: November, December, January
Spring: February, March, April
Summer: May, June, July
Fall: August, September, October
• Response (Y) - Vegetation (percent of total pellet weight)
• Transformation (For approximate normality):
 Y 

Y '  arcsin 

100


Source: K.A. Engel and L.S. Young (1989). “Spatial and Temporal Patterns in the Diet of Common
Ravens in Southwestern Idaho,” The Condor, 91:372-378
Seasonal Diet Patterns in Ravens - Data/Means
Y
j=1
j=2
j=3
Winter(i=1)
94.3
90.3
83.0
Fall(i=2)
80.7
90.5
91.8
Summer(i=3)
80.5
74.3
32.4
Fall (i=4)
67.8
91.8
89.3
Y'
j=1
j=2
j=3
Winter(i=1)
1.329721
1.254080
1.145808
Fall(i=2)
1.115957
1.257474
1.280374
Summer(i=3)
1.113428
1.039152
0.605545
Fall (i=4)
0.967390
1.280374
1.237554
1.329721  1.254080  1.145808
y 1. 
 1.24203
3
1.115957  1.257474  1.280374
y 2. 
 1.217935
3
1.113428  1.039152  0.605545
y 3. 
 0.919375
3
0.967390  1.280374  1.237554
y 4. 
 1.16773
3
1.329721  ...  1.237554
y .. 
 1.135572
12
Seasonal Diet Patterns in Ravens - Data/Means
Plot of Transformed Data by Season
1.500000
1.400000
1.300000
Transformed % Vegetation
1.200000
1.100000
1.000000
0.900000
0.800000
0.700000
0.600000
0.500000
0
1
2
3
Season
4
5
Seasonal Diet Patterns in Ravens - ANOVA
Total Variation: (dfTotal  12-1  11)
TSS  (1.329721  1.135572) 2  ...  (1.27554  1.135572) 2  0.438425
Between Group Variation: (dfT  4-1  3)
SSB  3 (1.24203  1.135572) 2  ...  (1.161773  1.135572) 2   0.197387
Within Group Variation: (dfE  12-4  8)
SSW  (1.329721  1.243203) 2  ...  (1.237554  1.161773) 2  0.241038
ANOVA
Source of Variation
Between Groups
Within Groups
SS
0.197387
0.241038
df
3
8
Total
0.438425
11
MS
F
P-value
0.065796 2.183752 0.167768
0.03013
F crit
4.06618
Do not conclude that seasons differ with respect to vegetation intake
Seasonal Diet Patterns in Ravens - Spreadsheet
Month
NOV
DEC
JAN
FEB
MAR
APR
MAY
JUN
JUL
AUG
SEP
OCT
Season
1
1
1
2
2
2
3
3
3
4
4
4
Y'
1.329721
1.254080
1.145808
1.115957
1.257474
1.280374
1.113428
1.039152
0.605545
0.967390
1.280374
1.237554
Season MeanOverall Mean
1.243203
1.135572
1.243203
1.135572
1.243203
1.135572
1.217935
1.135572
1.217935
1.135572
1.217935
1.135572
0.919375
1.135572
0.919375
1.135572
0.919375
1.135572
1.161773
1.135572
1.161773
1.135572
1.161773
1.135572
Sum
TSS
0.037694
0.014044
0.000105
0.000385
0.014860
0.020968
0.000490
0.009297
0.280928
0.028285
0.020968
0.010400
0.438425
Total SS
Between Season SS
(Y’-Overall Mean)2
(Group Mean-Overall Mean)2
SST
0.011584
0.011584
0.011584
0.006784
0.006784
0.006784
0.046741
0.046741
0.046741
0.000687
0.000687
0.000687
0.197387
SSE
0.007485
0.000118
0.009486
0.010400
0.001563
0.003899
0.037657
0.014346
0.098489
0.037785
0.014066
0.005743
0.241038
Within Season SS
(Y’-Group Mean)2
Transformations for Constant Variance
s 2  k
s 2  k2
yT 
y or yT 
Used for Poisson Distribution  k  1
y  0.375
yT  ln  y  or yT  ln  y  1
s  k 1   
2
yT  sin
1
 
y
1

Used for Binomial Distrbution  k  ,
n

Box-Cox Transformation: Power Transformation used to obtain normality
and often constant variance
 y    0
yT  
ln  y    0

y  

^
CRD with Non-Normal Data
Kruskal-Wallis Test
• Extension of Wilcoxon Rank-Sum Test to k > 2 Groups
• Procedure:
– Rank the observations across groups from smallest (1) to largest
( N = n1+...+nk ), adjusting for ties
– Compute the rank sums for each group: T1,...,Tk . Note that
T1+...+Tk = N(N +1)/2
Kruskal-Wallis Test
• H0: The k population distributions are identical
• HA: Not all k distributions are identical
2
12
k Ti
T .S .: H 

3(
N

1)

N ( N  1) i 1 ni
R.R.: H   ,k 1
2
P  val : P(   H )
2
An adjustment to H is suggested when there are many ties in the data.
H'
H
   t 3j  t j  
1  j 3

N

N


t j  number of observations in j th group of tied ranks
Example - Seasonal Diet Patterns in Ravens
Month
NOV
DEC
JAN
FEB
MAR
APR
MAY
JUN
JUL
AUG
SEP
OCT
Season
1
1
1
2
2
2
3
3
3
4
4
4
Y'
1.329721
1.254080
1.145808
1.115957
1.257474
1.280374
1.113428
1.039152
0.605545
0.967390
1.280374
1.237554
H 0 : No seasonal difference
Rank
12
8
6
5
9
10.5
4
3
1
2
10.5
7
• T1 = 12+8+6 = 26
• T2 = 5+9+10.5 = 24.5
• T3 = 4+3+1 = 8
• T4 = 2+10.5+7 = 19.5
H a : Seasonal Difference s
 (26) 2 (24.5) 2 (8) 2 (19.5) 2 
12
T .S . : H 




  3(12  1)  44.12  39  5.12
12(12  1)  3
3
3
3 
R.R.(  0.05) : H   .205, 41  7.815
P  value : P (  2  H  5.12)  .1632
Linear Contrasts
• Linear functions of the treatment means (population and
sample) such that the coefficients sum to 0.
• Used to compare groups or pairs of treatment means,
based on research question(s) of interest
t
Population Contrast: l  a11  ...  at t   ai i where
i 1
t
a
i 1
i
t
^
Estimated Contrast: l  a1 y1  ...  at y t    ai y i

i 1
2
2
2
n




a
s
s
2
2
i
V l  a12 
 s 
  ...  at 
i 1 ni
 n1 
 nt 
^ ^
MSW t 2
n1  ...  nt  n  V l 
ai

n i 1
^


ai2
V l  MSW 
i 1 ni
^
^
n
0
Orthogonal Contrasts & Sums of Squares
n
Two Contrasts: l1   ai i
i 1
n
l2   bi i
i 1
n
n
 a  b  0
i 1
i
i 1
i
n
ab
l1 , l2 are orthogonal if:  i i  0
i 1 ni
n
for balanced data,
a b  0
i 1
2
^ 2
 n



  ai y i 
l 
Contrast Sum of Squares: SSC   i 1 n 2   n  2
ai
ai


i 1 ni
i 1 ni
i i
2
^
nl 
for balanced data, SSC  n 
 ai2
i 1
Among t treatments, we can obtain t  1 pairwise orthogonal Contrasts: l1 ,..., lt 1
Then, we can decompose the Between Treatment Sum of Squares into the Contrasts:
SSB  SSC1  ...  SSCt 1 where each of the Contrast Sums of Squares has 1 degree of freedom
For any contrast: Testing H 0 k : lk  0 H Ak : lk  0 TS : Fk 
SSCk
MSW
RR : Fk  F ,1,nT t
Simultaneous Tests of Multiple Contrasts
• Using m contrasts for comparisons among t treatments
• Each contrast to be tested at  significance level, which
we label as I for individual comparison Type I error rate
• Probability of making at least one false rejection of one of
the m null hypotheses is the experimentwise Type I error
rate, which we label as E
• Tests are not independent unless the error (Within Group)
degrees are infinite, however Bonferroni inequality
implies that E ≤ mI  Choose I = E / m
Scheffe’s Method for All Contrasts
• Can be used for any number of contrasts, even those
suggested by data. Conservative (Wide CI’s, Low Power)
t
l   ai i
i 1
^
t
l   ai y i
i 1
t
s.t.  ai  0
i 1
a
^
SE  l   MSW  i
 
i 1 ni
t
t
t
i 1
i 1
2
H 0 : l   ai i  0 H A : l   ai i  0
^
Reject H 0 if l  SE  l  (t  1) F E ,df1 ,df2
 
^
df1  t  1, df 2  df Error  N  t
^
Simultaneous 1   100% Confidence Intervals: l  SE  l  (t  1) F E ,df1 ,df2
 
^
Post-hoc Comparisons of Treatments
• If differences in group means are determined from the Ftest, researchers want to compare pairs of groups. Three
popular methods include:
– Fisher’s LSD - Upon rejecting the null hypothesis of no
differences in group means, LSD method is equivalent to doing
pairwise comparisons among all pairs of groups as in Chapter 6.
– Tukey’s Method - Specifically compares all t(t-1)/2 pairs of
groups. Utilizes a special table (Table 10, pp. 1110-1111).
– Bonferroni’s Method - Adjusts individual comparison error rates
so that all conclusions will be correct at desired
confidence/significance level. Any number of comparisons can be
made. Very general approach can be applied to any inferential
problem
Fisher’s Least Significant Difference Procedure
• Protected Version is to only apply method after
significant result in overall F-test
• For each pair of groups, compute the least significant
difference (LSD) that the sample means need to differ
by to conclude the population means are not equal
LSDij  t / 2
1 1
MSE   
n n 
j 
 i
with df  N  t
Conclude  i   j if y i.  y j .  LSDij


Fisher' s Confidence Interval : y i.  y j .  LSDij
Tukey’s W Procedure
• More conservative than Fisher’s LSD (minimum
significant difference and confidence interval width are
higher).
• Derived so that the probability that at least one false
difference is detected is  (experimentwise error rate)
MSW
Wij  q (t , )
n
q given in Table 10, pp. 1110-1111 with   N - t
Conclude i   j if y i.  y j .  Wij
Tukey's Confidence Interval:
y
i.

 y j .  Wij
1 1
q (t , )
When the sample sizes are unequal, use Wij 
MSW   
n n 
2
j 
 i
Bonferroni’s Method (Most General)
• Wish to make C comparisons of pairs of groups with simultaneous
confidence intervals or 2-sided tests
•When all pair of treatments are to be compared, C = t(t-1)/2
• Want the overall confidence level for all intervals to be “correct” to
be 95% or the overall type I error rate for all tests to be 0.05
• For confidence intervals, construct (1-(0.05/C))100% CIs for the
difference in each pair of group means (wider than 95% CIs)
• Conduct each test at =0.05/C significance level (rejection region
cut-offs more extreme than when =0.05)
• Critical t-values are given in table on class website, we will use
notation: t/2,C, where C=#Comparisons,  = df
Bonferroni’s Method (Most General)
1 1
Bij  t / 2,C ,v MSE   
n n 
j 
 i
(t given on class website with v  N-t )
Conclude  i   j if y i.  y j .  Bij


Bonferroni ' s Confidence Interval : y i.  y j .  Bij
Example - Seasonal Diet Patterns in Ravens
Note: No differences were found, these calculations are only
for demonstration purposes
MSE  0.03013 ni  3 t.025,8  2.306 q.05,t  4,df E 8  4.53 t.025,C 6,df E 8  3.479
1 1
LSDij  2.306 (0.03013)    0.3268
3 3
1
Wij  4.53 (0.03013)   0.4540
3
1 1
Bij  3.479 (0.03013)    0.4930
3 3
Comparison(i vs j) Group i Mean Group j MeanDifference
1 vs 2
1.243203
1.217935
0.025267
1 vs 3
1.243203
0.919375
0.323828
1 vs 4
1.243203
1.161773
0.081430
2 vs 3
1.217935
0.919375
0.298560
2 vs 4
1.217935
1.161773
0.056162
3 vs 4
0.919375
1.161773
-0.242398
Dunnett’s Method to Compare Trts to Control
• Want to compare t-1 “active” treatments with a “control”
condition simultaneously
• Based on equal sample sizes of n subjects per treatment
• Makes use of Dunnett’s d-table (Table 11, pp. 1112-15),
indexed by k = t-1 comparisons, error degrees of
freedom,  = N – t, E, and whether the individual tests
are 1-sided or 2-sided
D  d E  k , 
2 MSW
n
H Ai : i  c
Reject H 0i if
H Ai : i  c
Reject H 0i if y i  y c  D1-sided
H Ai : i  c
Reject H 0i if y i  y c   D1-sided
H 0 i : i   c
y i  y c  D2-sided
Nonparametric Comparisons
• Based on results from Kruskal-Wallis Test
• Makes use of the Rank-Sums for the t treatments
• If K-W test is not significant, do not compare any pairs
of treatments
• Compute the t(t-1)/2 absolute mean differences
• Makes use of the Studentized Range Table (table 10)
• For large-samples, conclude treatments are different if:
R i  R j  KWij 
q E  t ,  
2
N  N  1  1 1 
  
12
 ni n j 
Download