COMPLETELY RANDOMIZED DESIGNS (CRD)

advertisement
STAT 512
Completely Randomized Designs
COMPLETELY RANDOMIZED DESIGNS (CRD)
• For now, t unstructured treatments (e.g. no factorial structure)
• “Completely randomized” means no restrictions on the
randomization of units to treatments (except for sample sizes
ordinarily fixed in advance), so with N available units:
– Randomly select n1 units for assignment to trt 1
– Randomly select n2 remaining units for assignment to trt 2
– ...
– Last nt units are assigned to trt t
– Or, any other procedure s.t. all N !/(n1 !n2 !...nt !) possible
assignment patterns have equal probability
• Ordinarily reflects model and assumptions for 1-way ANOVA
1
STAT 512
Completely Randomized Designs
“Cell Means” model:
• yi,j = µi + i,j
• i = 1...t ,
j = 1...ni ,
Pt
i=1
ni = N
• µi represents E[yi,j ], a response corresponding to treatment i,
– in the laboratory in which the experiment was performed
– on the day the experiment was run ...
• E[i,j ] = 0,
V ar[i,j ] = σ 2 , independent, perhaps normal
• represents (in part) unit-to-unit variation; “indep” is justified (in
part) by complete randomization
• Matrix representation:
– y = Xµ + – [N × 1] = [N × t][t × 1] + [N × 1]
2
STAT 512
Completely Randomized Designs
























y1,1
...
y1,n1
y2,1
...
y2,n2
...
yt,1
...
yt,nt


 
 
 
 
 
 
 
 
 
 
 
=
 
 
 
 
 
 
 
 
 
 
 
1
0
...
0
























...
...
...
...
1
0
...
0
0
1
...
0
...
...
...
...
0
1
...
0
...
...
...
...
0
0
...
1
...
...
...
...
0
0
...
1
3

µ1
µ2
...
µt





 


 
 
 
+
 
 









1,1

...























1,n1
2,1
...
2,n2
...
t,1
...
t,nt
STAT 512
Completely Randomized Designs
“Effects” model:
• yi,j = α + τi + i,j
• This says the same thing mathematically ...
– Let α = anything, then τi = µi − α
• α now represents everything in common to all experimental runs,
τi is the deviation from this due only to treatment i
• Matrix representation:
– y = Xθ + , θ = (α, τ1 , τ2 , ..., τt )0
– [N × 1] = [N × (t + 1)][(t + 1) × 1] + [N × 1]
– X is a different matrix now ... will often use “X” generically
– Less-than-full-rank
4
STAT 512
Completely Randomized Designs
























y1,1
...
y1,n1
y2,1
...
y2,n2
...
yt,1
...
yt,nt


 
 
 
 
 
 
 
 
 
 
 
=
 
 
 
 
 
 
 
 
 
 
 
1
1
0
...
0
























...
...
...
...
...
1
1
0
...
0
1
0
1
...
0
...
...
...
...
...
1
0
1
...
0
...
...
...
...
...
1
0
0
...
1
...
...
...
...
...
1
0
0
...
1
5

α
τ1
τ2
...
τt




 


 
 
 
 
+
 
 
 
 







1,1

...























1,n1
2,1
...
2,n2
...
t,1
...
t,nt
STAT 512
Completely Randomized Designs
ESTIMATION: review
• y = Xθ + ... (general X , iid errors)
(first piece is expectation of y ... the “noiseless model”)
• MLE for normal errors, LS in any case, is any sol’n to:
X0 Xθ̂ = X0 y
– “Normal equations”, as in “norm”, not “normal dist’n”
– X of full column rank: θ̂ = (X0 X)−1 X0 y
– Otherwise: θ̂ = (X0 X)− X0 y
∗ generalized inverse, non-unique
∗ AA− A = A
• Need to review statistical properties of θ̂
6
STAT 512
Completely Randomized Designs
EXPECTATIONS:
• Recall that for random y with E[y] = m,
V ar[y] = V
– m vector of means, V matrix of variances and covariances
– E[Ay] = Am
– V ar[Ay] = AVA0
(generalization of “constant-squared” in the scalar case)
• So,
– E[θ̂] = (X0 X)− X0 (Xθ) ... generally
– E[θ̂] = θ ... full rank
7
STAT 512
Completely Randomized Designs
• In the less-than-full-rank case, θ̂ is not uniquely determined, but
consider estimable functions:
– c0 θ, where c0 = l0 X for some N -vector l
0 θ = c0 θ̂ = (l0 X)(X0 X)− X0 y
– cc
• X(X0 X)− X0 = H is invariant to choice of g-inverse
– “projection matrix”, into space spanned by columns of X
– symmetric N × N , idempotent, H for “hat” because
ŷ = Xθ̂ = Hy
– c0 θ̂ = l0 Xθ̂ = l0 Hy
– E[c0 θ̂] = l0 HXθ = l0 Xθ = c0 θ


c01



– Likewise for C =  ... 
, E[Cθ̂] = Cθ if C = LX
c0m
8
STAT 512
Completely Randomized Designs
• CRD Examples:
– τi isn’t estimable
– µi = α + τi is estimable, but is a function of “experiment” as
well as “treatment”
– µi − µj = τi − τj is estimable ... comparative
(see X, slide 5)
9
STAT 512
Completely Randomized Designs
VARIANCES:
• Recall for estimable Cθ,
Cθ̂ = (LX)(X0 X)− X0 y = LHy
• V ar[Cθ̂] = LH(σ 2 I)H0 L0 = σ 2 LHL0
= σ 2 (LX)(X0 X)− (X0 L0 ) = σ 2 C(X0 X)− C0 ... generally
• V ar[Cθ̂] = σ 2 C(X0 X)−1 C0 ... full rank
• in particular, V ar[θ̂] = σ 2 (X0 X)−1 ... full rank
10
STAT 512
Completely Randomized Designs
11
ALLOCATION:
• Interest in estimating Cθ well, i.e. with minimum variance
• Allocate N units to treatments to minimize the average contrast
variance
Pt
2 0
0
−1
• minni ,i=1...t avej σ cj (X X) cj , subject to i=1 ni = N
– or g-inv if X less than full rank and Cθ estimable
– c0j , the jth row in C
– σ 2 is a positive constant, doesn’t matter
0
−1
• minni ,i=1...t trace C(X X)
0
0
C , subject to
Pt
ni = N
−1
Pt
ni = N
• minni ,i=1...t trace C C(X X)
0
, subject to
i=1
i=1
– linear optimality, generalization of A-optimality
• If X is not of full rank but Cθ is estimable, same argument
applies to (X0 X)−
STAT 512
Completely Randomized Designs
EXAMPLE
• Full-rank cell means model, θ = µ
(X0 X)−1 = diag(1/n1 , 1/n2 , ...1/nt )
• Interest in estimating differences between the “control”
(treatment 1) and each of the other treatments


1 −1
0 ...
0


 1
0 −1 ...
0 


C=

 ... ... ... ... ... 


1
0
0 ... −1
– t−1 × t
12
STAT 512
Completely Randomized Designs





0
CC=




t−1
−1
−1
... −1
−1
1
0 ...
−1
0
1 ...
...
...
−1
0
...
...
0 ...
0
−1
• trace C C(X X)
= (t − 1)/n1 +
Pt
1/ni
Pt
1/ni
i=2
• want to:
– minimize, w.r.t. n1 , n2 , ...nt ,
(t − 1)/n1 +
i=2
– subject to:
Pt
i=1
ni = N


0 


0 


... 

1
– t × t
0
13
STAT 512
Completely Randomized Designs
• Constrained Minimization by Method of Lagrangian Multipliers:
Pt
Pt
φ = [(t − 1)/n1 + i=2 1/ni ] + λ( i=1 ni − N )
(first term is what you want to minimize, second is what is
constrained to be zero)
∂φ/∂n1 = −(t − 1)/n21 + λ
∂φ/∂ni = −1/n2i + λ i = 2...t
Pt
∂φ/∂λ = i=1 ni − N
– Set each to zero and solve:
p
n1 = (t − 1)/λ
p
ni = 1/λ i = 2...t
n1 =
√
√ t−1N
t−1+(t−1)
ni =
√
N
t−1+(t−1)
14
STAT 512
Completely Randomized Designs
• Note: This is really a continuous optimization technique applied
to a discrete problem. In practice, need to round solutions to
integer values, or use as starting point in a discrete search
• Note also that Cov[µ1d
− µi , µ1d
− µj ] = σ 2 /n1
– relatively larger n1 reduces these ...
• Postscript:
– intuition might say you need less info about control, since you
have more experience with it ...
– but this means you intend to USE that information ... e.g. via
a Bayesian prior
15
STAT 512
Completely Randomized Designs
16
EXAMPLE: Less-Than-Full-Rank version
• “Effects” model, θ = (α τ1 ... τt )0

n n1

 n n
1
 1

X0 X = 
 n2 0

 ... ...

nt

0

 0


0
−
(X X) = 
 0

 ...

0
0
0
n2
...
0

... 0 


... 0 


... ... 

... nt
n2
...
0
0
nt

...
0










1/n1
0
...
0
0
1/n2
...
0
...
...
...
...
0
0
... 1/nt
STAT 512
Completely Randomized Designs
17
• General algorithm for g-inverse of symmetric matrices (e.g.
“Linear Models”, Searle, Wiley)
– find a full-rank, principal minor of maximum dimension
– invert it
– fill in zeros everywhere

0

 0

C=
 ...

0
else
1
−1
1
0
...
...
1
0
0 ...
0


−1 ...
0 


... ... ... 

0 ... −1
• C0 C and this (X0 X)− are as before, but with leading row/column
of zero ... so problem and solution are the same
STAT 512
Completely Randomized Designs
HYPOTHESIS TESTING: review
• Alternative (full) and Null (restricted) hypotheses:
– HypA : y = XA θ A + – Hyp0 : y = X0 θ 0 + = XA Pθ 0 + • Columns of X0 are all linear combinations of columns of XA
• XA is N -by-pA , P is pA -by-p0 , p0 < pA
18
STAT 512
Completely Randomized Designs
• Test for treatment effects: cell means model


 
1


XA = 



1
1
1
1
• Test for treatment effects: effects model
 


1
 
1 1
 0 


 
 1

1

, P = 
XA = 

0  or
 1

 
1


 0 
 
1
1


 



 1 
 1 
 , P =   , X0 = 


 1 
 1 

 


1
1
19
0
1

0


 
1
 1 

 
 1
 
 1  , X0 = 
 1
 

 1 
 
1
1






STAT 512
Completely Randomized Designs
• Test for “additional terms”: partitioned model
– Hyp0 : y = X1 θ 1 + – HypA : y = X1 θ 1 + X2 θ 2 + • This is a special case of the general form:


I

 , X0 = XA P = X1
XA = (X1 , X2 ), P =
0
• But X0 doesn’t have to be a subset of columns from XA
20
STAT 512
Completely Randomized Designs
• For either model:
P P
SSE = i j (yi,j − ŷi,j )2 ... sometimes “resid”
P P
= i j (yi,j − x0i θ̂)2 , where x0i is row from X
P P
= i j (yi,j − x0i (X0 X)− X0 y)2
= squared length of y − X(X0 X)− X0 y
= y0 (I − H)(I − H)y = y0 (I − H − H + H2 )y
= y0 (I − H)y (key result for what’s coming up)
= y0 y − (Hy)0 (Hy)
= length2 of y - length2 of Hy
21
STAT 512
Completely Randomized Designs
• Given truth of HypA (which means Hyp0 MAY also be true), the
validity of Hyp0 is judged by how much worse the data fit the
corresponding model:
SST = SSE(Hyp0 ) - SSE(HypA )
= y0 (I − H0 )y − y0 (I − HA )y
= y0 HA y − y0 H0 y = y0 (HA − H0 )y
• Notation: Here and later, the subscript on H is the subscript on
the corresponding X. Here it is 0 or A, later also 1 or 2.
22
STAT 512
Completely Randomized Designs
23
EXAMPLE
• Full-rank (cell means) model
• Hyp0 : µ1 = µ2 = ... = µt , or yi,j = µ + i,j
• HypA : at least two differ, or yi,j = µi + i,j

1n1

 1n2
X0 = 
 ...




 ,


H0 = 1(10 1)−1 10 =
1
N
JN ×N
1nt


1n1
...
0n1

 0n2
XA = 
 ...

...
0n2 
0nt
...
...




 , HA = 


... 

1nt
1
J
n1 n1 ×n1
...
0n1 ×nt

0n2 ×n1
...
0n2 ×nt
...
...
...





0nt ×n1
...
1
J
nt nt ×nt
STAT 512
Completely Randomized Designs
Letting yi represent the ni -element vector of observations associated
with treatment i,
SST = y0 HA y − y0 H0 y
0 1
0 1
y
J)y
−
y
( N J)y
(
i
i i ni
=
P
=
P
=
P
ni (ȳi. )2 − N (ȳ.. )2
=
P
ni (ȳi. − ȳ.. )2
1 2
i ni yi.
i
i
−
1 2
N y..
EXAMPLE
• Reduced-rank (effects) model, same problem
• Due to the invariance of the hat matrices, H0 and HA are the
same as before
• So, the resultant sums of squares are also the same
24
STAT 512
Completely Randomized Designs
EXPECTATIONS
• Recall that for random y with E[y] = m, V ar[y] = V
– E[y0 Ay] = m0 Am + trace(AV)
• Suppose HypA really is true (perhaps also Hyp0 ), so:
– E[y] = XA θ A
– E[SSE(HypA )]
= θ 0A X0A (I − HA )XA θ A + trace((I − HA )σ 2 I))
= θ 0A X0A (XA − XA )θ A + σ 2 (trace(I) − trace(HA ))
= 0 + σ 2 (N − rank(XA ))
(because for idempotent matrices, rank=trace,
and rank(H) = rank(X))
25
STAT 512
Completely Randomized Designs
• E[ SSE( Hyp0 ) ]
= θ 0A X0A (I − H0 )XA θ A + trace((I − H0 )σ 2 I))
= θ 0A X0A (I − H0 )XA θ A + σ 2 (N − rank(X0 ))
= QA|0 (θ A ) + σ 2 (N − rank(X0 ))
(subscript A|0 comes from space(XA ) -space(X0 ))
• For the partitioned model:
= (θ 01 X01 + θ 02 X02 )(I − H1 )(X1 θ 1 + X2 θ 2 )
+ σ 2 (N − rank(X1 ))
= θ 02 X02 (I − H1 )X2 θ 2 + σ 2 (N − rank(X1 ))
(X1 terms vanish ... why?)
= Q2|1 (θ 2 ) + σ 2 (N − rank(X1 ))
(subscript 2|1 comes from space(X2 ) -space(X1 ))
• E[ SST ] = Q2|1 (θ 2 ) + (rank(XA ) − rank(X0 ))σ 2
26
STAT 512
Completely Randomized Designs
EXAMPLE: All-Treatments-Equal Hypothesis
• In either parameterization:
– rank(XA ) − rank(X0 ) = t − 1
– H0 = 1(10 1)−1 10 =
1
NJ
• Full rank:
– X0A (I − H0 )XA = X0A XA −
1
0
N XA JXA
= diag(n) − N1 nn0 , where n = (n1 , n2 , ...nt )0
Pt
P
QA|0 (µ) = i=1 ni µ2i − N1 ( ni µi )2
Pt
Pt
1
2
= i=1 ni (µi − µ̄) , µ̄ = N i=1 ni µi
• Reduced rank:
– X02 (I − H1 )X2 = X0A (I − H0 )XA in full-rank case
Pt
P
1
2
Q2|1 (τ ) = i=1 ni (τi − τ̄ ) , τ̄ = N
ni τi
27
STAT 512
Completely Randomized Designs
DISTRIBUTION THEORY: review
1. y ∼ M V N (m, σ 2 I),
A and B p.s.d. symmetric,
Any two of:
• A and B idempotent
• A + B idempotent
• AB = 0
Implies all of:
•
•
1 0
σ 2 y Ay
1 0
σ 2 y By
0
∼ χ2 (rank(A), σ12 m0 Am)
0
∼ χ2 (rank(B), σ12 m0 Bm)
• the two quadratic forms are independent
2. For idempotent A, rank(A) = trace(A)
28
STAT 512
Completely Randomized Designs
IMPLICATIONS for CRD
1
σ2
•
SSE(HypA ) =
i.e. “central”
•
1
σ2
SST=
1 0
σ 2 y (I
1 0
σ 2 y (HA
SST
t−1
/
0
− HA )y ∼ χ2 (N − t, 0)
20
− H0 )y ∼ χ (t − 1, Q(τ )/σ 2 )
SSE (HypA )
N −t
∼ F 0 (t − 1, N − t, Q(τ )/σ 2 )
• or, replace τ with µ in Q ... same thing
• Other things being equal, larger Q → greater power
29
STAT 512
Completely Randomized Designs
Example
• CRD
N = 20
• numerator df = 2
t=3
n1 = n2 = 8, n3 = 4
denominator df = 17
• Suppose µ1 = 1 µ2 = 2 µ3 = 3 σ = 1.5
• Then
– µ̄ = 1.8
– Q( µ ) = 8(1 − 1.8)2 + 8(2 − 1.8)2 + 4(3 − 1.8)2 = 11.2
– Q( µ )/σ 2 = 4.9777
30
STAT 512
Completely Randomized Designs
• R:
– qf(.95, 2, 17) → 3.5915
95th quantile of central F, i.e. critical value
– 1-pf(3.5915, 2, 17, 4.9777) → 0.4308
probability of value in critical region for non-central F
• SAS:
– proc iml;
– finv(.95, 2, 17); → 3.5915
– 1-probf(3.5915, 2, 17, 4.9777); → 0.4308
31
STAT 512
Completely Randomized Designs
PRACTICAL POINT
• May not know enough about problem to speculate about the
(complete) true value of µ, but willing to guess:
D = µmax − µmin = τmax − τmin
• Bounds for Q (assume equal ni )
– Most favorable situation (Q greatest)
∗ greatest “variance” of µ’s
∗ t/2 µ’s = µmax t/2 µ’s = µmin
∗ Q = ni × t × ( 21 D)2 = N × D2 /4
– Least favorable situation (Q smallest)
∗
∗
∗
∗
smallest “variance” of µ’s
1 µ = µmax 1 µ = µmin
others = (µmax + µmin )/2
Q = 2 × ni × ( 12 D)2 = ni × D2 /2
32
STAT 512
Completely Randomized Designs
33
REDUCED NORMAL EQUATIONS FOR τ
• y = Xθ + = X1 β + X2 τ + – β are nuisance parameters
– for our CRD effects model, X1 β = 1α
• “Full” normal equations:


X01 X1
X02 X1
0
θ̂
=
X
y
(X0 X)


X01 X2
X02 X2
β̂

τ̂

=
X01 X1 β̂ + X01 X2 τ̂ = X01 y
X02 X1 β̂ + X02 X2 τ̂ = X02 y
X01 y

X02 y

STAT 512
Completely Randomized Designs
X01 X1 β̂ + X01 X2 τ̂ = X01 y
X02 X1 β̂ + X02 X2 τ̂ = X02 y
• Pre-multiply first equation by X02 X1 (X01 X1 )− , to match leading
1st term of second equation
– will need g-inverse if X1 is of less than full rank
X02 X1 β̂ + X02 H1 X2 τ̂ = X02 H1 y
• Subtract this from the second equation to get “reduced normal
equations”:
X02 (I − H1 )X2 τ̂ = X02 (I − H1 )y
34
STAT 512
Completely Randomized Designs
• Pattern of “reduced” normal equations:
– Same form found in weighted regression ... there I − H1 is
replaced with a “weight” matrix, any matrix proportional to
V ar(y)
– X2 -model projected into null space of X1
– Same as regression of resid(y : X1 ) on resid(X2 : X1 )
– Note reliance of form on H1 X2 and (I − H1 )X2 ...
35
STAT 512
Completely Randomized Designs
• In fact, suppose we transform the independent variables in this
problem, pretending the model isn’t really partitioned, and that
the single model matrix is:
X2|1 = (I − H1 )X2
– then the usual normal equations are:
X02|1 X2|1 τ̂ = X02|1 y
– i.e. the correct result ... we’ve “corrected for β” by “projecting
X2 out of” the space spanned by X1
– so, for example,
∗ c0 τ is estimable iff c0 = l0 X2|1
−
0 τ ) = σ 2 c0 (X0 X
d
∗ V ar(c
2|1 2|1 ) c
36
STAT 512
Completely Randomized Designs
37
X02|1 X2|1 is called the “design information matrix” in the book, and
sometimes denoted I2|1 . For CRD’s:
• H1 = 1(10 1)−1 10 =
1
n Jn×n
• X2|1 = (I − H1 )X2 =

(1 − nN1 )1n1

 − n1 1

N n2


...

− nN1 1nt
− nN2 1n1
(1 − nN2 )1n2
...
...
− nNt 1n1
− nNt 1n2
...
...
...
... (1 −
nt
N )1nt
− nN2 1nt
• I2|1 = X02|1 X2|1 = diag(n) −
1
0
N nn ,







where n0 = (n1 , n2 , ..., nt )
• Note that X2|1 and I2|1 are each of rank t − 1, reflecting the fact
that τ isn’t estimable
STAT 512
Completely Randomized Designs
So, for CRD’s, given interest in τ and accounting for α,
0 τ is determined by σ 2 c0 I − c
• the precision of cd
2|1
– one g-inverse of I2|1 = diag(n) −
check it
1
0
nn
N
−
is I2|1
= diag−1 (n) ...
• the power of the test for equality of τ ’s is determined by
τ 0 I2|1 τ /σ 2
38
Download