Single-Factor Studies

advertisement
Single-Factor Studies
KNNL – Chapter 16
Single-Factor Models
• Independent Variable can be qualitative or
quantitative
• If Quantitative, we typically assume a linear,
polynomial, or no “structural” relation
• If Qualitative, we typically have no “structural”
relation
• Balanced designs have equal numbers of replicates at
each level of the independent variable
• When no structure is assumed, we refer to models as
“Analysis of Variance” models, and use indicator
variables for treatments in regression model
Single-Factor ANOVA Model
• Model Assumptions for Model Testing
 All probability distributions are normal
 All probability distributions have equal variance
 Responses are random samples from their
probability distributions, and are independent
• Analysis Procedure
 Test for differences among factor level means
 Follow-up (post-hoc) comparisons among pairs or
groups of factor level means
Cell Means Model
r  # of levels of the study factor
ni  # of replicates (cases, units) for the i th level of the study factor
r
n1  ...  nr   ni  nT  overall sample size (number of cases)
i 1
Yij  i   ij
i  1,..., r
j  1,..., ni
Yij  Response for j th case within the i th level of the study factor
i  Population mean for the i th level of the study factor
 ij ~ NID  0,  2  where NID  Normally and Independently Distributed
 E Yij   i  2 Yij    2

Yij are independent N i ,  2

Cell Means Model – Regression Form
Suppose r  3 and n1  n2  n3  2
 Y11 
Y 
 12 
Y 
Y   21 
Y22 
Y31 
 
Y32 
1
1

0
X
0
0

0
0
0
1
1
0
0
0
0 
0

0
1

1
 E Y11 
1


1
 E Y12  

 E Y21
0
E Y  

Xβ



 E Y22 
0
 E Y  
0
31



 E Y32 
0
2 0 0
X'X   0 2 0 
 0 0 2 
 1 
β   2 
 3 
0
0
1
1
0
0
 11 
 
 12 
 
ε   21 
 22 
  31 
 
 32 
 2 0
0
0
0
0


2
0

0
0
0
0


2
0
0 
0
0
0
2
 2 ε  
  I
2
0
0 
0
0
0
0
0
0
0 2 0 


2
0
0
0
0
0



0
 1 
 
0 
1
 1   
0     2 
 2   
0     2 
 3 
1     3 
 

1
 3 
 Y11  Y12 
X'Y  Y21  Y22 
Y31  Y32 
^ 

0   Y11  Y12   Y 1   1 
0.5 0
^
  ^ 
-1




β =  X'X  X'Y   0 0.5 0  Y21  Y22   Y 2     2 
 0
0 0.5 Y31  Y32  Y 3   ^ 
  3 
 
Model Interpretations
• Factor Level Means
 Observational Studies – The i represent the
population means among units from the populations
of factor levels
 Experimental Studies - The i represent the means of
the various factor levels, had they been assigned to a
population of experimental units
• Fixed and Random Factors
 Fixed Factors – All levels of interest are observed in study
 Random Factors – Factor levels included in study represent a
sample from a population of factor levels
Fitting ANOVA Models
ni
ni
Notation: Yi   Yij
Y i 
Y
ij
j 1
ni
j 1
ni
r
r
Y
 i
ni
ni
Y   Yij
Y  
i 1 j 1
 Y
i 1 j 1
nT
ij
Y r ni Y i


nT i 1 nT
Least Squares and Maximum Likelihood Estimation
ni
ni
Error Sum of Squares: Q      Yij  i 
r
i 1 j 1
r
2
ij
2
i 1 j 1
nk
Q

 2 Ykj  k 
k
j 1
nk
Q
Setting
0 
k
nk
Y

kj
j 1
^
^
 nk  k   k 

Likelihood: L 1 ,..., r ,  | Y11 ,..., Yrnr 
2

Y
1
2 2
j 1
kj
nk

 Y k
k  1,..., r
 1 r ni
2
exp   2  Yij  i  
n
 2 i 1 j 1

ni
maximizing Likelihood wrt 1 ,..., r  minimizing  Yij  i 
r
i 1 j 1
^
Fitted values: Y ij  Y i
^
Residuals: eij  Yij  Y ij  Yij  Y i
2
k  1,..., r
Analysis of Variance

 Y
Yij  Y   Yij  Y i 
Total
Deviation
Deviation from
trt mean (residual)
 Y

r
ni

 
r

 Y 
Deviation of trt mean
from overall mean

r

  Y
ni
i 1 j 1
Yij  Y 

2
i 1
ni
r

  Yij  Y i
i 1 j 1

2

ni
 Y i Y i  Y    Y i  Y 
ij
i 1 j 1
i
 Y i  0
ij
j 1
ni
r

  Y i  Y 
i 1 j 1
r
ni


Total (Corrected) Sum of Squares: SSTO   Yij  Y 
i 1 j 1
ni
r

Treatment Sum of Squares: SSTR   Y i  Y 
i 1 j 1
r
ni

Error Sum of Squares: SSE   Yij  Y i
i 1 j 1
Note: SSTO  SSTR  SSE

ni
s 
2
i
j 1
Yij  Y i
ni  1


2

2
2

2
r
dfTO  nT  1

  ni Y i  Y 
i 1

2
dfTR  r  1
df E  nT  r
dfTO  dfTR  df E
Useful result:
2

Mean Squares: MSTR 
ni

 ni  1 s
  Yij  Y i
SSTR
r 1
MSE 
2
i
j 1

SSE
nT  r
2
r
 SSE    ni  1 s
i 1
2
i
r
df E  nT  r    ni  1
i 1
ANOVA Table
Source
df
SS
MS
E{MS }
r
Treatments r  1

r
SSTR   ni Y i  Y 
i 1
nT  r
Error
nT  1
Total
ni
r

SSE   Yij  Y i
i 1 j 1
r
ni


2

SSTO   Yij  Y 
i 1 j 1
r
Note: SSTR   ni Y  nT Y
2
i
2

i 1

SSTR
2
MSTR 
  i 1
r 1
2
SSE
nT  r
MSE 

r

ni i   

2
r 1
2
2
ni
r
SSE   Y   ni Y i
i 1 j 1
2
ij
2
i 1
 r ni 2  r
 Yij     E Y      E  Yij    ni i2  nT  2
 i 1 j 1  i 1
r
2
2 
2
2
 r
2
2
 Y i 
 E Y i   i 
 E  ni Y i    ni i2  r 2
ni
ni
 i 1
 i 1
E Yij   i
2
 
 
2
2
ij
2
 
 
E Y i   i
2
i
r
 
E Y  
n 
i 1
i
nT
i
 
 
 Y  
2
2
nT
 
 E Y
2

 
2

2
nT

2

 E nT Y   nT 2   2
F-Test for H0: 1  ...  r
H 0 : 1  ...   r
H A : Not all i are equal
MSTR
MSE
Under null hypothesis (and independence and normality of errors):
Test Statistic: F * 
SSTR
2
~  r21
SSE
2
~  n2T  r and are independent (independent even if H 0 false)
 SSTR

r

1


  2
 MSTR


~ F  r  1, nT  r 
 SSE
 MSE
n

r

  2  T

Decision Rule: Reject H 0 if F * 
MSTR
 F 1   ; r  1, nT  r 
MSE
General Linear Test of Equal Means
H 0 : 1  ...  r  c
c  Common Mean (Reduced Model)
H A : Not all i are equal (Complete Model)
^
^
Reduced Model:  c  Y   Y ij
2




i


 SSE ( R)    Yij  Y ij    Yij  Y 

i 1 j 1 
i 1 j 1
r
ni
^
^
r
n
2
 SSTO df R  nT  1
2
 SSE df F  nT  r
^
Complete (Full) Model:  i  Y i  Y ij
2
r
i


 SSE ( F )    Yij  Y ij    Yij  Y i

i 1 j 1 
i 1 j 1
r
ni
^
n
 SSE ( R)  SSE ( F )   SSTO  SSE   SSTR 

 


n

1

n

r
 T   T    r  1  MSTR
df R  df F



*
Test Statistic: F 



 SSE ( F ) 
 SSE 
 SSE  MSE
 df

n r 
n  r 
F


 T

 T

Factor Effects Model
Alternative Form of Model (Necessary for interactions in multi-factor models):
i     i         i
Yij     i   ij
 i  i    "Effect" of i th factor level
 ij ~ NID  0,  2 
Defining  :
r
Unweighted Mean:  

i 1
r
i

i 1

i 1
r
Weighted Mean:    wi i
r
r
s.t.
i
0
w 1
i 1
i

r
 w
i 1
i i
0
Weights may represent the population sizes in observational studies
Note: 1  ...  r
  1  ...   r  0
Regression Approach – Factor Effects Model
Suppose r  3 and n1  n2  n3  2 and Unweighted Mean Model:  1   2   3  0   3   1   2
 Y11 
Y 
 12 
Y 
Y   21 
Y22 
Y31 
 
Y32 
1 1 0 
1 1 0 


1 0 1 
X

1 0 1 
1 1 1


1

1

1


  
β    1 
 2 
 11 
 
 12 
 
ε   21 
 22 
 31 
 
 32 
 E Y11 
    1      1 
1 1 0 


       
1 1 0 
 E Y12  

     1    1 
 E Y21
1 0 1         2      2 
E Y  
  Xβ  


  1   
E
Y






1
0
1


22 


     2    2 
2
 E Y  

1 1 1       1   2      3 
31




 


 E Y32 
1 1 1
    1   2      3 
6 0 0 
X'X  0 4 2 
0 2 4 
Y11  Y12  Y21  Y22  Y31  Y32 


X'Y   Y11  Y12   Y31  Y32  
 Y21  Y22   Y31  Y32  
^ 

0
0  Y11  Y12  Y21  Y22  Y31  Y32   Y     
1/ 6
^
 ^ 
-1

 
β =  X'X  X'Y   0
1/ 3 1/ 6   Y11  Y12   Y31  Y32     Y 1  Y      1 
 0 1/ 6 1/ 3   Y21  Y22   Y31  Y32   Y 2  Y    ^ 

  2 
 
Factor Effects Model with Weighted Mean
ni
Weights are relative sample sizes: wi 
nT
r
r
r
ni
  wi i  0    i   ni i  0
i 1
i 1 nT
i 1
r 1
r 1
ni
 nr r   ni i   r    i
i 1
i 1 nr
 Yij     1 X ij1  ...   r 1 X ij ,r 1   ij
 1 if i  1

 n1
X ij1  
if i  r
 nr
 0 otherwise
...
 1 if i  r  1

 nr 1
X ij ,r 1  
if i  r
 nr
 0 otherwise
Regression for Cell Means Model
Yij  i   ij  1 X ij1  ...   r X ijr
1 if i  1
X1  
0 if i  1
 1 
β   
 r 
1 if i  r
Xr  
0 if i  r
...
Y 1 
 
β 
Y r  
 
^
When fitting with a regression package, no intercept is used
Under H 0 : 1  ...  r  c :
1
X   
1
β   c 
^
β  Y  
Randomization (aka Permutation) Tests
• Treats the units in the study as a finite population of
units, each with a fixed error term ij
• When the randomization procedure assigns the unit to
treatment i, we observe Yij = .  i + ij
• When there are no treatment effects (all i = 0),
Yij = .  ij
• We can compute a test statistic, such as F* under all (or
in practice, many) potential treatment arrangements of
the observed units (responses)
• The p-value is measured as proportion of observed test
statistics as or more extreme than original.
• Total number of potential permutations = nT!/(n1!...nr!)
Power Approach to Sample Size Choice - Tables
When the means are not all equal, the F -statistic is non-central F :
r
F ~ F  r  1, nT  r ,   where  
*
1
 n 

r
i 1
i
i
  
r
When all sample sizes are equal:  
1

r
2
n   i   
where  
n 
where  
r
The power of the test, when conducted at the significance level of  :


Pr F *  F 1   ; r  1, nT  r ,  
i
nT
r
2
i 1
i
i 1

i 1
i
r
See Table B.11
Choose sample sizes so that the power is sufficiently high for specific
 1 ,..., r  or effects levels of interest 1 ,..., r 
 max  i   min  i 
Table B.12 is simple to use for equal sample sizes and 
mean levels of interest


Power Approach to Sample Size Choice – R Code
When the means are not all equal, the F -statistic is non-central F :
r
F ~ F  r  1, nT  r ,   where  
*
 n 
i 1
i
i
  
where  
2
r
When all sample sizes are equal:  
r
2
n   i   
i 1
n 
i 1
nT
r
2
where  
2

i 1
r
The power of the test, when conducted at the significance level of  :

i

Pr F *  F 1   ; r  1, nT  r  | F * ~ F  r  1, nT  r ,  
In R:
F 1   ; r  1, nT  r   qf (1   , r  1, nT  r )
Power = 1    1  pf  qf (1   , r  1, nT  r ), r  1, nT  r ,  
i
i
Power Approach to Finding “Best” Treatment
Goal: Determining the best treatment (one with highest or lowest mean):
1    Probability the treatment with highest (lowest) sample mean
has highest (lowest) population mean
  Difference between highest (lowest) mean and 2nd highest (lowest) mean
r  Number of treatments
 n
for various r ,1  

Solve for n for given  ,
Table B.13 gives
Download