STAT 512 Completely Randomized Designs COMPLETELY RANDOMIZED DESIGNS (CRD) • For now, t unstructured treatments (e.g. no factorial structure) • “Completely randomized” means no restrictions on the randomization of units to treatments (except for sample sizes ordinarily fixed in advance), so with N available units: – Randomly select n1 units for assignment to trt 1 – Randomly select n2 remaining units for assignment to trt 2 – ... – Last nt units are assigned to trt t – Or, any other procedure s.t. all N !/(n1 !n2 !...nt !) possible assignment patterns have equal probability • Ordinarily reflects model and assumptions for 1-way ANOVA 1 STAT 512 Completely Randomized Designs “Cell Means” model: • yi,j = µi + i,j • i = 1...t , j = 1...ni , Pt i=1 ni = N • µi represents E[yi,j ], a response corresponding to treatment i, – in the laboratory in which the experiment was performed – on the day the experiment was run ... • E[i,j ] = 0, V ar[i,j ] = σ 2 , independent, perhaps normal • represents (in part) unit-to-unit variation; “indep” is justified (in part) by complete randomization • Matrix representation: – y = Xµ + – [N × 1] = [N × t][t × 1] + [N × 1] 2 STAT 512 Completely Randomized Designs y1,1 ... y1,n1 y2,1 ... y2,n2 ... yt,1 ... yt,nt = 1 0 ... 0 ... ... ... ... 1 0 ... 0 0 1 ... 0 ... ... ... ... 0 1 ... 0 ... ... ... ... 0 0 ... 1 ... ... ... ... 0 0 ... 1 3 µ1 µ2 ... µt + 1,1 ... 1,n1 2,1 ... 2,n2 ... t,1 ... t,nt STAT 512 Completely Randomized Designs “Effects” model: • yi,j = α + τi + i,j • This says the same thing mathematically ... – Let α = anything, then τi = µi − α • α now represents everything in common to all experimental runs, τi is the deviation from this due only to treatment i • Matrix representation: – y = Xθ + , θ = (α, τ1 , τ2 , ..., τt )0 – [N × 1] = [N × (t + 1)][(t + 1) × 1] + [N × 1] – X is a different matrix now ... will often use “X” generically – Less-than-full-rank 4 STAT 512 Completely Randomized Designs y1,1 ... y1,n1 y2,1 ... y2,n2 ... yt,1 ... yt,nt = 1 1 0 ... 0 ... ... ... ... ... 1 1 0 ... 0 1 0 1 ... 0 ... ... ... ... ... 1 0 1 ... 0 ... ... ... ... ... 1 0 0 ... 1 ... ... ... ... ... 1 0 0 ... 1 5 α τ1 τ2 ... τt + 1,1 ... 1,n1 2,1 ... 2,n2 ... t,1 ... t,nt STAT 512 Completely Randomized Designs ESTIMATION: review • y = Xθ + ... (general X , iid errors) (first piece is expectation of y ... the “noiseless model”) • MLE for normal errors, LS in any case, is any sol’n to: X0 Xθ̂ = X0 y – “Normal equations”, as in “norm”, not “normal dist’n” – X of full column rank: θ̂ = (X0 X)−1 X0 y – Otherwise: θ̂ = (X0 X)− X0 y ∗ generalized inverse, non-unique ∗ AA− A = A • Need to review statistical properties of θ̂ 6 STAT 512 Completely Randomized Designs EXPECTATIONS: • Recall that for random y with E[y] = m, V ar[y] = V – m vector of means, V matrix of variances and covariances – E[Ay] = Am – V ar[Ay] = AVA0 (generalization of “constant-squared” in the scalar case) • So, – E[θ̂] = (X0 X)− X0 (Xθ) ... generally – E[θ̂] = θ ... full rank 7 STAT 512 Completely Randomized Designs • In the less-than-full-rank case, θ̂ is not uniquely determined, but consider estimable functions: – c0 θ, where c0 = l0 X for some N -vector l 0 θ = c0 θ̂ = (l0 X)(X0 X)− X0 y – cc • X(X0 X)− X0 = H is invariant to choice of g-inverse – “projection matrix”, into space spanned by columns of X – symmetric N × N , idempotent, H for “hat” because ŷ = Xθ̂ = Hy – c0 θ̂ = l0 Xθ̂ = l0 Hy – E[c0 θ̂] = l0 HXθ = l0 Xθ = c0 θ c01 – Likewise for C = ... , E[Cθ̂] = Cθ if C = LX c0m 8 STAT 512 Completely Randomized Designs • CRD Examples: – τi isn’t estimable – µi = α + τi is estimable, but is a function of “experiment” as well as “treatment” – µi − µj = τi − τj is estimable ... comparative (see X, slide 5) 9 STAT 512 Completely Randomized Designs VARIANCES: • Recall for estimable Cθ, Cθ̂ = (LX)(X0 X)− X0 y = LHy • V ar[Cθ̂] = LH(σ 2 I)H0 L0 = σ 2 LHL0 = σ 2 (LX)(X0 X)− (X0 L0 ) = σ 2 C(X0 X)− C0 ... generally • V ar[Cθ̂] = σ 2 C(X0 X)−1 C0 ... full rank • in particular, V ar[θ̂] = σ 2 (X0 X)−1 ... full rank 10 STAT 512 Completely Randomized Designs 11 ALLOCATION: • Interest in estimating Cθ well, i.e. with minimum variance • Allocate N units to treatments to minimize the average contrast variance Pt 2 0 0 −1 • minni ,i=1...t avej σ cj (X X) cj , subject to i=1 ni = N – or g-inv if X less than full rank and Cθ estimable – c0j , the jth row in C – σ 2 is a positive constant, doesn’t matter 0 −1 • minni ,i=1...t trace C(X X) 0 0 C , subject to Pt ni = N −1 Pt ni = N • minni ,i=1...t trace C C(X X) 0 , subject to i=1 i=1 – linear optimality, generalization of A-optimality • If X is not of full rank but Cθ is estimable, same argument applies to (X0 X)− STAT 512 Completely Randomized Designs EXAMPLE • Full-rank cell means model, θ = µ (X0 X)−1 = diag(1/n1 , 1/n2 , ...1/nt ) • Interest in estimating differences between the “control” (treatment 1) and each of the other treatments 1 −1 0 ... 0 1 0 −1 ... 0 C= ... ... ... ... ... 1 0 0 ... −1 – t−1 × t 12 STAT 512 Completely Randomized Designs 0 CC= t−1 −1 −1 ... −1 −1 1 0 ... −1 0 1 ... ... ... −1 0 ... ... 0 ... 0 −1 • trace C C(X X) = (t − 1)/n1 + Pt 1/ni Pt 1/ni i=2 • want to: – minimize, w.r.t. n1 , n2 , ...nt , (t − 1)/n1 + i=2 – subject to: Pt i=1 ni = N 0 0 ... 1 – t × t 0 13 STAT 512 Completely Randomized Designs • Constrained Minimization by Method of Lagrangian Multipliers: Pt Pt φ = [(t − 1)/n1 + i=2 1/ni ] + λ( i=1 ni − N ) (first term is what you want to minimize, second is what is constrained to be zero) ∂φ/∂n1 = −(t − 1)/n21 + λ ∂φ/∂ni = −1/n2i + λ i = 2...t Pt ∂φ/∂λ = i=1 ni − N – Set each to zero and solve: p n1 = (t − 1)/λ p ni = 1/λ i = 2...t n1 = √ √ t−1N t−1+(t−1) ni = √ N t−1+(t−1) 14 STAT 512 Completely Randomized Designs • Note: This is really a continuous optimization technique applied to a discrete problem. In practice, need to round solutions to integer values, or use as starting point in a discrete search • Note also that Cov[µ1d − µi , µ1d − µj ] = σ 2 /n1 – relatively larger n1 reduces these ... • Postscript: – intuition might say you need less info about control, since you have more experience with it ... – but this means you intend to USE that information ... e.g. via a Bayesian prior 15 STAT 512 Completely Randomized Designs 16 EXAMPLE: Less-Than-Full-Rank version • “Effects” model, θ = (α τ1 ... τt )0 n n1 n n 1 1 X0 X = n2 0 ... ... nt 0 0 0 − (X X) = 0 ... 0 0 0 n2 ... 0 ... 0 ... 0 ... ... ... nt n2 ... 0 0 nt ... 0 1/n1 0 ... 0 0 1/n2 ... 0 ... ... ... ... 0 0 ... 1/nt STAT 512 Completely Randomized Designs 17 • General algorithm for g-inverse of symmetric matrices (e.g. “Linear Models”, Searle, Wiley) – find a full-rank, principal minor of maximum dimension – invert it – fill in zeros everywhere 0 0 C= ... 0 else 1 −1 1 0 ... ... 1 0 0 ... 0 −1 ... 0 ... ... ... 0 ... −1 • C0 C and this (X0 X)− are as before, but with leading row/column of zero ... so problem and solution are the same STAT 512 Completely Randomized Designs HYPOTHESIS TESTING: review • Alternative (full) and Null (restricted) hypotheses: – HypA : y = XA θ A + – Hyp0 : y = X0 θ 0 + = XA Pθ 0 + • Columns of X0 are all linear combinations of columns of XA • XA is N -by-pA , P is pA -by-p0 , p0 < pA 18 STAT 512 Completely Randomized Designs • Test for treatment effects: cell means model 1 XA = 1 1 1 1 • Test for treatment effects: effects model 1 1 1 0 1 1 , P = XA = 0 or 1 1 0 1 1 1 1 , P = , X0 = 1 1 1 1 19 0 1 0 1 1 1 1 , X0 = 1 1 1 1 STAT 512 Completely Randomized Designs • Test for “additional terms”: partitioned model – Hyp0 : y = X1 θ 1 + – HypA : y = X1 θ 1 + X2 θ 2 + • This is a special case of the general form: I , X0 = XA P = X1 XA = (X1 , X2 ), P = 0 • But X0 doesn’t have to be a subset of columns from XA 20 STAT 512 Completely Randomized Designs • For either model: P P SSE = i j (yi,j − ŷi,j )2 ... sometimes “resid” P P = i j (yi,j − x0i θ̂)2 , where x0i is row from X P P = i j (yi,j − x0i (X0 X)− X0 y)2 = squared length of y − X(X0 X)− X0 y = y0 (I − H)(I − H)y = y0 (I − H − H + H2 )y = y0 (I − H)y (key result for what’s coming up) = y0 y − (Hy)0 (Hy) = length2 of y - length2 of Hy 21 STAT 512 Completely Randomized Designs • Given truth of HypA (which means Hyp0 MAY also be true), the validity of Hyp0 is judged by how much worse the data fit the corresponding model: SST = SSE(Hyp0 ) - SSE(HypA ) = y0 (I − H0 )y − y0 (I − HA )y = y0 HA y − y0 H0 y = y0 (HA − H0 )y • Notation: Here and later, the subscript on H is the subscript on the corresponding X. Here it is 0 or A, later also 1 or 2. 22 STAT 512 Completely Randomized Designs 23 EXAMPLE • Full-rank (cell means) model • Hyp0 : µ1 = µ2 = ... = µt , or yi,j = µ + i,j • HypA : at least two differ, or yi,j = µi + i,j 1n1 1n2 X0 = ... , H0 = 1(10 1)−1 10 = 1 N JN ×N 1nt 1n1 ... 0n1 0n2 XA = ... ... 0n2 0nt ... ... , HA = ... 1nt 1 J n1 n1 ×n1 ... 0n1 ×nt 0n2 ×n1 ... 0n2 ×nt ... ... ... 0nt ×n1 ... 1 J nt nt ×nt STAT 512 Completely Randomized Designs Letting yi represent the ni -element vector of observations associated with treatment i, SST = y0 HA y − y0 H0 y 0 1 0 1 y J)y − y ( N J)y ( i i i ni = P = P = P ni (ȳi. )2 − N (ȳ.. )2 = P ni (ȳi. − ȳ.. )2 1 2 i ni yi. i i − 1 2 N y.. EXAMPLE • Reduced-rank (effects) model, same problem • Due to the invariance of the hat matrices, H0 and HA are the same as before • So, the resultant sums of squares are also the same 24 STAT 512 Completely Randomized Designs EXPECTATIONS • Recall that for random y with E[y] = m, V ar[y] = V – E[y0 Ay] = m0 Am + trace(AV) • Suppose HypA really is true (perhaps also Hyp0 ), so: – E[y] = XA θ A – E[SSE(HypA )] = θ 0A X0A (I − HA )XA θ A + trace((I − HA )σ 2 I)) = θ 0A X0A (XA − XA )θ A + σ 2 (trace(I) − trace(HA )) = 0 + σ 2 (N − rank(XA )) (because for idempotent matrices, rank=trace, and rank(H) = rank(X)) 25 STAT 512 Completely Randomized Designs • E[ SSE( Hyp0 ) ] = θ 0A X0A (I − H0 )XA θ A + trace((I − H0 )σ 2 I)) = θ 0A X0A (I − H0 )XA θ A + σ 2 (N − rank(X0 )) = QA|0 (θ A ) + σ 2 (N − rank(X0 )) (subscript A|0 comes from space(XA ) -space(X0 )) • For the partitioned model: = (θ 01 X01 + θ 02 X02 )(I − H1 )(X1 θ 1 + X2 θ 2 ) + σ 2 (N − rank(X1 )) = θ 02 X02 (I − H1 )X2 θ 2 + σ 2 (N − rank(X1 )) (X1 terms vanish ... why?) = Q2|1 (θ 2 ) + σ 2 (N − rank(X1 )) (subscript 2|1 comes from space(X2 ) -space(X1 )) • E[ SST ] = Q2|1 (θ 2 ) + (rank(XA ) − rank(X0 ))σ 2 26 STAT 512 Completely Randomized Designs EXAMPLE: All-Treatments-Equal Hypothesis • In either parameterization: – rank(XA ) − rank(X0 ) = t − 1 – H0 = 1(10 1)−1 10 = 1 NJ • Full rank: – X0A (I − H0 )XA = X0A XA − 1 0 N XA JXA = diag(n) − N1 nn0 , where n = (n1 , n2 , ...nt )0 Pt P QA|0 (µ) = i=1 ni µ2i − N1 ( ni µi )2 Pt Pt 1 2 = i=1 ni (µi − µ̄) , µ̄ = N i=1 ni µi • Reduced rank: – X02 (I − H1 )X2 = X0A (I − H0 )XA in full-rank case Pt P 1 2 Q2|1 (τ ) = i=1 ni (τi − τ̄ ) , τ̄ = N ni τi 27 STAT 512 Completely Randomized Designs DISTRIBUTION THEORY: review 1. y ∼ M V N (m, σ 2 I), A and B p.s.d. symmetric, Any two of: • A and B idempotent • A + B idempotent • AB = 0 Implies all of: • • 1 0 σ 2 y Ay 1 0 σ 2 y By 0 ∼ χ2 (rank(A), σ12 m0 Am) 0 ∼ χ2 (rank(B), σ12 m0 Bm) • the two quadratic forms are independent 2. For idempotent A, rank(A) = trace(A) 28 STAT 512 Completely Randomized Designs IMPLICATIONS for CRD 1 σ2 • SSE(HypA ) = i.e. “central” • 1 σ2 SST= 1 0 σ 2 y (I 1 0 σ 2 y (HA SST t−1 / 0 − HA )y ∼ χ2 (N − t, 0) 20 − H0 )y ∼ χ (t − 1, Q(τ )/σ 2 ) SSE (HypA ) N −t ∼ F 0 (t − 1, N − t, Q(τ )/σ 2 ) • or, replace τ with µ in Q ... same thing • Other things being equal, larger Q → greater power 29 STAT 512 Completely Randomized Designs Example • CRD N = 20 • numerator df = 2 t=3 n1 = n2 = 8, n3 = 4 denominator df = 17 • Suppose µ1 = 1 µ2 = 2 µ3 = 3 σ = 1.5 • Then – µ̄ = 1.8 – Q( µ ) = 8(1 − 1.8)2 + 8(2 − 1.8)2 + 4(3 − 1.8)2 = 11.2 – Q( µ )/σ 2 = 4.9777 30 STAT 512 Completely Randomized Designs • R: – qf(.95, 2, 17) → 3.5915 95th quantile of central F, i.e. critical value – 1-pf(3.5915, 2, 17, 4.9777) → 0.4308 probability of value in critical region for non-central F • SAS: – proc iml; – finv(.95, 2, 17); → 3.5915 – 1-probf(3.5915, 2, 17, 4.9777); → 0.4308 31 STAT 512 Completely Randomized Designs PRACTICAL POINT • May not know enough about problem to speculate about the (complete) true value of µ, but willing to guess: D = µmax − µmin = τmax − τmin • Bounds for Q (assume equal ni ) – Most favorable situation (Q greatest) ∗ greatest “variance” of µ’s ∗ t/2 µ’s = µmax t/2 µ’s = µmin ∗ Q = ni × t × ( 21 D)2 = N × D2 /4 – Least favorable situation (Q smallest) ∗ ∗ ∗ ∗ smallest “variance” of µ’s 1 µ = µmax 1 µ = µmin others = (µmax + µmin )/2 Q = 2 × ni × ( 12 D)2 = ni × D2 /2 32 STAT 512 Completely Randomized Designs 33 REDUCED NORMAL EQUATIONS FOR τ • y = Xθ + = X1 β + X2 τ + – β are nuisance parameters – for our CRD effects model, X1 β = 1α • “Full” normal equations: X01 X1 X02 X1 0 θ̂ = X y (X0 X) X01 X2 X02 X2 β̂ τ̂ = X01 X1 β̂ + X01 X2 τ̂ = X01 y X02 X1 β̂ + X02 X2 τ̂ = X02 y X01 y X02 y STAT 512 Completely Randomized Designs X01 X1 β̂ + X01 X2 τ̂ = X01 y X02 X1 β̂ + X02 X2 τ̂ = X02 y • Pre-multiply first equation by X02 X1 (X01 X1 )− , to match leading 1st term of second equation – will need g-inverse if X1 is of less than full rank X02 X1 β̂ + X02 H1 X2 τ̂ = X02 H1 y • Subtract this from the second equation to get “reduced normal equations”: X02 (I − H1 )X2 τ̂ = X02 (I − H1 )y 34 STAT 512 Completely Randomized Designs • Pattern of “reduced” normal equations: – Same form found in weighted regression ... there I − H1 is replaced with a “weight” matrix, any matrix proportional to V ar(y) – X2 -model projected into null space of X1 – Same as regression of resid(y : X1 ) on resid(X2 : X1 ) – Note reliance of form on H1 X2 and (I − H1 )X2 ... 35 STAT 512 Completely Randomized Designs • In fact, suppose we transform the independent variables in this problem, pretending the model isn’t really partitioned, and that the single model matrix is: X2|1 = (I − H1 )X2 – then the usual normal equations are: X02|1 X2|1 τ̂ = X02|1 y – i.e. the correct result ... we’ve “corrected for β” by “projecting X2 out of” the space spanned by X1 – so, for example, ∗ c0 τ is estimable iff c0 = l0 X2|1 − 0 τ ) = σ 2 c0 (X0 X d ∗ V ar(c 2|1 2|1 ) c 36 STAT 512 Completely Randomized Designs 37 X02|1 X2|1 is called the “design information matrix” in the book, and sometimes denoted I2|1 . For CRD’s: • H1 = 1(10 1)−1 10 = 1 n Jn×n • X2|1 = (I − H1 )X2 = (1 − nN1 )1n1 − n1 1 N n2 ... − nN1 1nt − nN2 1n1 (1 − nN2 )1n2 ... ... − nNt 1n1 − nNt 1n2 ... ... ... ... (1 − nt N )1nt − nN2 1nt • I2|1 = X02|1 X2|1 = diag(n) − 1 0 N nn , where n0 = (n1 , n2 , ..., nt ) • Note that X2|1 and I2|1 are each of rank t − 1, reflecting the fact that τ isn’t estimable STAT 512 Completely Randomized Designs So, for CRD’s, given interest in τ and accounting for α, 0 τ is determined by σ 2 c0 I − c • the precision of cd 2|1 – one g-inverse of I2|1 = diag(n) − check it 1 0 nn N − is I2|1 = diag−1 (n) ... • the power of the test for equality of τ ’s is determined by τ 0 I2|1 τ /σ 2 38