IV. Randomized Complete Block Design (RCBD) IV.A Design of an RCBD IV.B Indicator-variable models and estimation for an RCBD IV.C Hypothesis testing using the ANOVA method for an RCBD IV.D Diagnostic checking IV.E Treatment differences IV.F Fixed versus random effects IV.G Generalized randomized complete block design Statistical Modelling Chapter IV 1 IV.A Design of an RCBD Definition II.6: A randomized complete block design is one in which the number of experimental units per block is equal to the number of treatments and every treatment occurs once and only once in each block, the order of treatments within a block being randomized. – – – • b denotes no. of blocks t denotes both no. of units in each block and no. of treatments. n = bt denotes total no. of observations. In RCBD group units into blocks such that the units in a block are as similar as possible. Statistical Modelling Chapter IV 2 Forming blocks in field experiments • Place plots parallel to the trend and blocks perpendicular to it. L e ss ston y e nd o f fie ld I II • Suppose trend not as I thought — went across the field. L e ss ston y s id e o f fie ld ... ... B lo ck III ... IV ... I . . . II . . . III . . . IV . . . S to n y s id e of fie ld B lo ck s to n ie r en d o f fie ld • Clearly, Blocks would be similar and plots different. • In fact this experiment can be less sensitive than a CRD — getting it wrong can be costly. Statistical Modelling Chapter IV 3 a) Obtaining a layout for an RCBD in R • General set of expressions for obtaining RCBD layout is given in Appendix B, Randomized layouts and sample size computations in R. • To generate a layout for particular case, need to substitute – actual values for b, t and n – actual names for Blocks, Units, Treats and the data frame to contain them. Statistical Modelling Chapter IV 4 Example IV.1 Penicillin yield • In this example the effects of four treatments (A, B, C and D) on the yield of penicillin are to be investigated. • Corn steep liquor, an important raw material in producing penicillin, is highly variable from one blending to another. • To ensure that the results of the experiment apply to > 1 blend, several blends to be used in experiment. • The trial was conducted using the same blend in 4 flasks and randomizing treatments to these 4. • Altogether five blends were utilized. • Crucial feature, making RCBD different from CRD, is that there are – 2 unrandomized factors indexing the units: Blends, Flasks – there is nesting between these factors: Flasks are nested within Blends because randomize treatments to Flasks within Blends. • Names to be used for the blocks, units and treatments for this example are Blends, Flask and Treat, respectively. • Also, b = 5 and t = 4 so that n = 20. • Assigning these values and substituting these names into the general expressions, yields the following output for this case. Statistical Modelling Chapter IV 5 R > > > > > • Flask is a nested factor; b <- 5 t <- 4 n <- b*t • Nested within RCBDPen.unit <- list(Blend=b, Flask=t) Blend RCBDPen.nest <- list(Flask = "Blend") > Treat <- factor(rep(1:t, times=b), labels=c("A","B","C","D")) > data.frame(fac.gen(RCBDPen.unit), Treat) #basic systematic arrangement Blend Flask Treat Blend Flask Treat Systematic 11 3 3 C 1 1 1 A 12 3 4 D 2 1 2 B arrangement on 13 4 1 A 3 1 3 C which 14 4 2 B 4 1 4 D randomization 15 4 3 C 5 2 1 A 16 4 4 D 6 2 2 B based 17 5 1 A 7 2 3 C 18 5 2 B 8 2 4 D Blend & Flask order 19 5 3 C 9 3 1 A determined by order 20 5 4 D 10 3 2 B in RCBDPen.unit > RCBDPen.lay <- fac.layout(unrandomized = RCBDPen.unit, + nested.factors = RCBDPen.nest, + randomized = Treat, seed = 311) Statistical Modelling Chapter IV 6 Layout > RCBDPen.lay Units Permutation Blend Flask Treat 1 1 11 1 1 C 2 2 12 1 2 B 3 3 10 1 3 D 4 4 9 1 4 A 5 5 13 2 1 C 6 6 15 2 2 D 7 7 16 2 3 B 8 8 14 2 4 A 9 9 8 3 1 D 10 10 7 3 2 C 11 11 5 3 3 A 12 12 6 3 4 B 13 13 17 4 1 A 14 14 19 4 2 D 15 15 20 4 3 B 16 16 18 4 4 C 17 17 4 5 1 A 18 18 2 5 2 D 19 19 1 5 3 B 20 20 3 5 4 C This layout is said to be in standard order for Blend then Flask: In general the first factor changes slowest and the last fastest. • So with the first blend, the Treatments are to be done in the order C, B, D, A. Statistical Modelling Chapter IV 7 IV.B Indicator-variable models and estimation for an RCBD a)Maximal model • The maximal model used for an RCBD is: B +T = E Y = X B X T a n d v a r Y = I n 2 where Y is the n-vector of random variables for the response variable observations, is the b-vector of parameters specifying a different mean response for each block, XB is the nb matrix indicating the block from which an observation came, is the t-vector of parameters specifying a different mean response for each treatment, XT is the nt matrix indicating the observations that received each of the treatments. Statistical Modelling Chapter IV 8 Example IV.1 Penicillin yield (continued) • The yields of penicillin, in nonrandom order T re a tm e n t A 89 84 81 87 79 1 2 3 4 5 B le n d B 88 77 87 92 81 C 97 92 87 89 80 D 94 79 85 84 88 80 80 85 85 Yield Yield 90 90 95 95 • initial exploration of the data — differences? 1 Statistical Modelling Chapter IV 2 3 Blend 4 5 A B C Treatment D 9 Yields in a vector in standard order for Blend then Treatment • Same order as systematic layout i.e. prerandomization layout 89 88 97 94 84 77 92 79 8 1 87 y = , 87 85 87 92 89 84 79 8 1 80 8 8 XB Statistical Modelling Chapter IV 1 1 1 1 0 0 0 0 0 0 = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 , 0 0 0 0 0 0 1 1 1 1 1 2 = 3 , 4 5 XT 1 0 0 0 1 0 0 0 1 0 = 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 , 0 1 0 0 0 1 0 0 0 1 1 = 2 3 4 10 Estimator of expected values • Our model also assumes Y ~ N(B+T, V) • The model for the expectation is still of the form E[Y] = Xq with X = [XB XT] and q = [ ]. • It can be shown that ˆ B +T = B T G where B , T a n d G are the n-vectors of block, treatment and grand means, respectively. N ote that B = M B Y , T = M T Y and G = M G Y where MB, MT and MG are the block, treatment and grand mean operators, respectively. • So once again the estimator of the expected values are functions of means. Statistical Modelling Chapter IV 11 Mean operators • Suppose data arranged in the vector Y in nonrandomized order with all the observations for a block placed together. – Standard order for blocks then treatments. • Then the mean operators are: MG = n MB = t 1 Jb Jt = n 1 Jn 1 MT = b Ib J t 1 J b It where is called the direct product operator and, • if Ar and Bc are square matrices of order r and c A r Bc a1 1B = a r 1B a1r B a rr B Mean operators simpler than for CRD — divisors factored out leaving matrices with 0s & 1s. Statistical Modelling Chapter IV 12 Grand mean operator for standard order MG = 20 Statistical Modelling Chapter IV 1 J5 J 4 J4 J4 1 J4 = 20 J 4 J 4 1 1 1 1 1 1 1 1 1 1 1 = 2 0 1 1 1 1 1 1 1 1 1 1 J4 J4 J4 J4 J4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 J4 J4 J4 J4 J4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 J4 J4 J4 J4 J4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 J4 J4 J4 J4 J 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 M B = 4 I5 J 4 Block J mean = 1 00 4 0 operator 0 1 for 1 1 standar 1 0 d order 00 4 4 4 4 4 4 4 4 4 0 0 1 0 = 4 0 0 0 0 0 0 0 0 0 0 Statistical Modelling Chapter IV 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 4 J4 0 4 4 0 4 4 0 4 4 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 4 0 4 4 J4 0 4 4 0 4 4 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 4 4 0 4 4 0 4 4 J4 0 44 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 44 0 4 4 0 4 4 0 4 4 J 4 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 14 Treatment mean operator for standard order MT = 5 1 J 5 I4 I4 I 1 4 = I4 5 I 4 I 4 I4 I4 I4 I4 I4 1 0 0 0 1 0 0 0 1 1 0 = 5 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 Statistical Modelling Chapter IV I4 I4 I4 I4 I4 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 I4 I4 I4 I4 I 4 I4 I4 I4 I4 I4 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 15 Estimators for example Statistical Modelling Chapter IV B1 B1 B 1 B 1 B 2 B2 B2 B2 B3 B B = 3 , B 3 B 3 B4 B4 B4 B4 B5 B 5 B 5 B5 T A G TB G T G C T G D G T A G TB G T C TD G T A G T G T = B and G = T G C G T D G T A G TB G T C G TD G T A G T B T G C G TD 16 Estimates for the example • The means are in the following table: 1 2 Blend 3 4 5 Means A 89 84 81 87 79 84 Treatment B C 88 97 77 92 87 87 92 89 81 80 85 89 Statistical Modelling Chapter IV D 94 79 85 84 88 86 Means 92 83 85 88 82 86 92 92 92 92 83 83 83 83 85 85 b = , 85 85 88 888 88 88 82 82 82 82 84 85 89 86 84 85 89 86 84 85 t = , 89 86 84 85 89 86 84 85 89 86 8 6 8 6 8 6 8 6 8 6 8 6 8 6 8 6 8 6 8 6 g = 86 8 6 8 6 86 86 86 86 86 86 86 17 Estimates for the example 92 92 92 92 83 83 83 83 85 85 b = , 85 85 88 888 88 88 82 82 82 82 84 85 89 86 84 85 89 86 84 85 t = , 89 86 84 85 89 86 84 85 89 86 8 6 90 8 6 9 1 8 6 95 8 6 92 8 6 8 1 8 6 82 8 6 86 8 6 83 8 6 83 8 6 84 g = a n d ψ B+T = b t g = 86 88 8 6 85 8 6 86 86 87 86 9 1 86 88 86 80 86 8 1 86 85 86 82 • These fitted value are different for each block-treatment combination but display an additive pattern. Statistical Modelling Chapter IV 18 Additivity • The fitted value are those for a model that is additive in Block and Treatment parameters: – B+T = E[Y] = XB XT. • So its fitted values display an additive pattern: ˆ Β +Τ = B T G • Hope an adequate description of the data. • In one direction, same trend as means: Statistical Modelling Chapter IV 19 b) Alternative expectation models • There are 4 possible different models for the expectation that we consider: ψG = XG no ψ Β = X B b lo ck ψΤ = X T tre a tm e n t ψ Β+Τ = X B X T • Note that: tre a tm e n t o r b lo ck d iffe re n ce s b lo ck d iffe re n ce s o n ly d iffe re n ce s o n ly a n d tre a tm e n t d iffe re n ce s C X G C X B C X B C X G C X T C X B X T X T • Consequently: ψ G ψ Β , ψ Τ , ψ Β + Τ a n d ψ Β , ψ Τ ψ Β + Τ • Also note that, like the CRD, the models B and T can be obtained from B+T by setting either or equal to zero and G can be obtained from B and T by setting = 1 and = 1, respectively. Statistical Modelling Chapter IV 20 Estimators of expected values • Estimators of the expected values under the different models: ψˆ G = G no ψˆ Β = B b lo c k ψˆ Τ = T tre a tm e n t ψˆ Β + Τ = B T G Statistical Modelling Chapter IV tre a tm e n t o r b lo c k d iffe re n c e s b lo c k d iffe re n c e s o n ly d iffe re n c e s o n ly a n d tre a tm e n t d iffe re n c e s 21 IV.C Hypothesis testing using the ANOVA method for an RCBD • An ANOVA will be used to choose between the 4 alternative expectation models for an RCBD. a) Analysis of the penicillin example Example IV.1 Penicillin yield (continued) • The hypothesis test for the example RCBD is as follows: Step 1: Set up hypotheses a) H0: 1 = 2 = 3 = 4 (or XT not required in model) H1: not all population Treatment means are equal b) H0: 1 = 2 = 3 = 4 = 5 (or XB not required in model) H1: not all population Blend means are equal Set a = 0.05. Statistical Modelling Chapter IV 22 Hypothesis test Step 2: Calculate test statistics • The analysis of variance table for a RCBD is: S o u rce B le n d s F la sks [B le n d s] T re a tm e n ts R e sid ua l T o tal df 4 SSq 264 MSq 6 6 .0 F 3 .5 0 P ro b 0 .0 4 1 15 3 12 19 296 70 226 560 2 3 .3 1 8 .8 1 .2 4 0 .3 3 9 • Note that Flasks[Blends] in this table means "Flasks within Blends". Step 3: Decide between hypotheses • It would appear that there are significant differences between the blends but not between the treatments so that the expectation model that best describes the response appears to be B = XB. Statistical Modelling Chapter IV 23 Blocking effectiveness • In our RCBD example there were significant differences between the blends so that the blocking based on blends has been effective. • Turns out that, if the units within a block are as similar as possible, there will be block differences. • If a CRD had been used, – that is 4 treatments randomized to 20 flasks irrespective of blends, then – Residual SSq Blend SSq + RCBD Residual SSq – viz. 264 + 226 = 490 and the mean square 490/16 = 30.625. – That is, residual MSq would have been twice (30.6 vs 18.8) as large and the experiment much less sensitive. Statistical Modelling Chapter IV 24 b) Sums of squares for the analysis of variance • In this section we will use the generic names of Blocks, Units and Treatments for the factors in an RCBD. • The estimators of the SSqs for the RCBD ANOVA are the SSqs of the following vectors: T o ta l o r U n its S S q : DG = Y G B lo ck s S S q : Be = B G U n its[B lo ck s] S S q : DB = Y B T re a tm e n ts S S q : Te = T G R e sid u a l S S q : D B+T = Y B T G Statistical Modelling Chapter IV = Y B e Te G 25 SSq (continued) • From section IV.B, Models and estimation for an RCBD, we have that G = MGY = n B = MBY = t 1 1 T = MTY = b Jb Ib 1 Jt Y Jt Y Jb It Y a n d le t Y = M U Y = I b It Y . Statistical Modelling Chapter IV 26 SSq (continued) • It can be shown that the SSqs for the ANOVA are given by D G D G = Y G Y G = Y Q U Y w ith Q U = M U M G B e B e = B G B G = Y Q B Y w ith Q B = M B M G D B D B = Y B Y B = Y Q B U Y w ith Q B U = M U M B Te Te = T G T G = Y Q T Y w ith Q T = M T M G D B + T D B + T = Y B T G Y B T G = Y Q B U w ith Q B U Res Res Y = MU M T MB MG • All the Ms and Qs are symmetric and idempotent. Statistical Modelling Chapter IV 27 ANOVA table is constructed as follows: S o u rc e B lo cks U n its [B loc ks] T re a tm en ts R e s id ua l df b1 SSq Y Q B Y b (t 1 ) Y Q B U Y t1 Y Q T Y (b 1)(t 1) Y Q B U Res Y M Sq Y Q B Y 2 = sB b 1 sB sB U Y Q T Y s T sB U t 1 Y Q B U Y Res b 1 t T o ta l b t 1 Statistical Modelling Chapter IV = 1 F 2 2 2 sT p pB 2 Res pT 2 Res 2 = sB U Res Y Q U Y 28 Geometrical interpretation • The matrix Q orthogonally projects the U • data vector into the bt-1 dimensional part of the bt-dimensional data space that is orthogonal to equiangular line. This is partitioned, by QB and QBU, into two subspaces: a) the b1 dimensional part of the bdimensional Block space that is orthogonal to equiangular line and b) b(t1) dimensional Units[Blocks] space. • The latter space is then partitioned, by QT and Q B U , into two subspaces: R es • a) the t1 dimensional part of the tdimensional Treatment space that is orthogonal to equiangular line and b) the (b1)(t1) Residual subspace. S o u rce B lo cks b1 SSq Y Q B Y U n its[B locks] b (t 1 ) Y Q B U Y T re a tm e n ts t 1 Y Q T Y R e sid u a l (b 1 )(t 1 ) T o ta l df b t 1 Y Q B U Res Y Y Q U Y That is, the Units space is divided into the three orthogonal subspaces: – the Blocks subspace, – Treatments subspace, – Residual subspace. • Here Block and Treatment spaces are column spaces of the matrices XB and XT, respectively. Statistical Modelling Chapter IV 29 Example IV.1 Penicillin yield (continued) • The effects needed for the analysis have been added to the means in the following table: 1 2 B le n d 3 4 5 Means E ffe c ts Statistical Modelling Chapter IV A 89 84 81 87 79 84 -2 T re a tm e n t B C 88 97 77 92 87 87 92 89 81 80 85 89 -1 3 D 94 79 85 84 88 86 0 M e a n s E ffe c ts 92 6 83 -3 85 -1 88 2 82 -4 86 0 30 Vectors for SSQ Total Flask Blend Flask[Blend] Treat Yield deviations Effects deviations effects dG = Q Uy be = Q By dB = Q BF y te = Q Ty y = yg Treat A B C D A B C D A B Units SSq is YQUY = 560, C Blend SSq is YQBY = 264, D A B Flask[Blend] SSq is C YQBFY = 296, D A Treatments SSq is B C YQTY = 70 and D Residual SSq is 226. SSq 89 88 97 94 84 77 92 79 81 87 87 85 87 92 89 84 79 81 80 88 3 2 11 8 -2 -9 6 -7 -5 1 1 -1 1 6 3 -2 -7 -5 -6 2 560 =bg 6 6 6 6 -3 -3 -3 -3 -1 -1 -1 -1 2 2 2 2 -4 -4 -4 -4 264 = yb = t g -3 -4 5 2 1 -6 9 -4 -4 2 2 0 -1 4 1 -4 -3 -1 -2 6 296 -2 -1 3 0 -2 -1 3 0 -2 -1 3 0 -2 -1 3 0 -2 -1 3 0 70 Residual Flask[Blend] deviations d B +T = Q B F Res y = y t bg -1 -3 2 2 3 -5 6 -4 -2 3 -1 0 1 5 -2 -4 -1 0 -5 6 226 N o te o rth o g o n a l d e co m p o sitio n o f y = g b e t e d B +T Statistical Modelling Chapter IV 31 c) Expected mean squares • To justify choice of test statistic, want to work out the E[MSq]s under the 4 alternative expectation models. • E[MSq]s under maximal model Source df MSq E[MSq] ψ B+T Blocks b1 Y Q B Y b 1 Units[Blocks] Treatments Residual b(t1) t1 (b1)(t1) Y Q T Y t 1 Y Q B U Y Res 2 qB ψ 2 qT ψ 2 b 1 t 1 Total bt1 • Residual MSq estimates the uncontrolled variation, – that is the variation arising from uncontrolled differences between units within the same block, both treatment and block differences having been eliminated. Statistical Modelling Chapter IV 32 E[MSq]s under the 4 alternative expectation models Source df MSq E[MSq] ψ B+T Blocks Y Q B Y b1 b 1 Units[Blocks] b(t1) Treatments t1 Residual Y Q T Y (b1)(t1) t 1 Y Q B U Y Res ψT 2 qB ψ 2 qT ψ 2 ψB ψG 2 qB ψ 2 2 qT ψ 2 2 2 2 2 2 b 1 t 1 Total qB ψ = bt1 ψ Q Bψ b 1 b = t i . b 1 and 2 qT ψ = i =1 ψ Q Tψ t 1 t = b j . 2 t 1 j =1 • Once again numerator of: – qB(Y) is SSq of QBY = (MB – MG)Y – qT(Y) is SSq of QTY = (MT – MG)Y – where Y depends on model. • Expressions qB(Y) and qT(Y) above are under maximal models • To obtain those for reduced models set is and js to 0 or to . • Could compute population means of MSqs if knew is, js and 2. Statistical Modelling Chapter IV 33 Justifying the F ratios • Clear from these E[MSq]s that if the Treatments F is not significant then a model not involving XT is required – as those models are the ones for which qT(Y) = 0. • Similarly, if the Blocks F is not significant then a model not involving XB is required. • In the case where both are not significant, then the minimal model adequately describes the data. • Generally, will only present the E[MSq]s under the maximal model, realizing that q(Y) = 0 under the H0 that removes the term from the model. Statistical Modelling Chapter IV 34 Potential contributers to block and treatment mean differences 1 2 B le n d 3 4 5 Means A 89 84 81 87 79 84 T re a tm e n t B C 88 97 77 92 87 87 92 89 81 80 85 89 D 94 79 85 84 88 86 Means 92 83 85 88 82 86 • Two treatment means will differ because of the different treatments involved and because of the different runs (the units in this example) involved in the observations from which the means are calculated; • but block differences will not contribute to treatment mean differences as all treatments involve the same set of blocks. • E[MSq]s reflect this fact. • The Treatment F again involves the question: – "Is the variance of the treatment means greater than can be expected from uncontrolled differences between the runs?" Statistical Modelling Chapter IV 35 d) Summary of the hypothesis test • See notes Statistical Modelling Chapter IV 36 e) Comparison with traditional twoway ANOVA • As for the analysis for the CRD, the above and the traditional two-way ANOVA tables are essentially the same —the values of the F-statistics are exactly the same. As indented, see Treatments confounded with Units[Blocks] S o u rc e B lo c k s U n its [B lo c k s ] T re a tm e n ts R e s id u a l df b1 b (t 1 ) t 1 (b 1 )(t 1 ) T o ta l b t 1 S o u rc e in tw o -w a y A N O V A B e tw e e n B lo c k s B e tw e e n T re a tm e n ts E rro r T o ta l Residual is inherent variability of Units; Error? – the two tables have in common 3 sources that are labelled differently – but the tables differ in that our table includes the line Units[Blocks] — this source is partitioned. Statistical Modelling Chapter IV 37 f) Computation of the ANOVA in R • The expressions for analyzing a randomized complete block design are summarized in Appendix C, Analysis of designed experiments in R. Statistical Modelling Chapter IV 38 Example IV.1 Penicillin yield (continued) • First the data is entered into a data frame so that it contains – the factors Blend, Flask and Treat and – the numeric vector Yield Here data is in nonrandom order. Statistical Modelling Chapter IV > RCBDPen.dat Blend Flask Treat Yield 1 1 1 A 89 2 1 2 B 88 3 1 3 C 97 4 1 4 D 94 5 2 1 A 84 6 2 2 B 77 7 2 3 C 92 8 2 4 D 79 9 3 1 A 81 10 3 2 B 87 11 3 3 C 87 12 3 4 D 85 13 4 1 A 87 14 4 2 B 92 15 4 3 C 89 16 4 4 D 84 17 5 1 A 79 18 5 2 B 81 19 5 3 C 80 20 5 4 D 88 39 Model formula for aov function • As for CRD, use the aov function, either with or without the Error as part of the model. • In this case the uncontrolled variation is: – Blend differences – differences between Flasks within Blends (we denote Flasks[Blends]). • R shorthand for this: Blend/Flask – expands to Blend + Blend:Flask. Statistical Modelling Chapter IV 40 Output > RCBDPen.aov <- aov(Yield ~ Blend + Treat + + Error(Blend/Flask), RCBDPen.dat) > summary(RCBDPen.aov) Blend occurs outside and Error: Blend inside the Error function — Df Sum Sq Mean Sq necessary to get correct fitted Blend 4 264 66 values for diagnostic checking. Error: Blend:Flask Df Sum Sq Mean Sq F value Pr(>F) Treat 3 70.000 23.333 1.2389 0.3387 Residuals 12 226.000 18.833 > > > > #Compute Blend F and p Blend.F <- 66/18.833 Blend.p <- 1-pf(Blend.F, 4, 12) data.frame(Blend.F,Blend.p) Blend.F Blend.p 1 3.504487 0.0407441 Statistical Modelling Chapter IV Computation of Blend F and p. 41 Output > RCBDPen.NoError.aov <- aov(Yield ~ Blend + Treat, RCBDPen.dat) > summary(RCBDPen.NoError.aov) Df Sum Sq Mean Sq F value Pr(>F) Blend 4 264.000 66.000 3.5044 0.04075 Treat 3 70.000 23.333 1.2389 0.33866 Residuals 12 226.000 18.833 F and p for Blend, but controversial • ANOVA table from the expression that – includes Error in model resembles our table — prefer – without is like the traditional ANOVA table. Statistical Modelling Chapter IV 42 IV.D Diagnostic checking • • Again, we have assumed Y ~ N(, 2I) where, for the maximal model, B+T = E[Y] = XB XT For this model to be appropriate requires a similar set of behaviours as for the CRD: a) response is operating additively (see section IV.B, Indicator variable models and estimation for an RCBD) as specified by the maximal model: a treatment has about the same additive effect on each unit; b) variability of the units within a block are the same for each block; c) each observation displays the covariance implied by the model (independence for Blocks fixed and equal correlation within blocks for Blocks random); and d) that the response of the units is normally distributed. Statistical Modelling Chapter IV 43 Diagnostic plots • Same set of diagnostic plots as for the CRD can be used. – Residual-versus-fitted-values – Normal probability plots. • A particular pattern to look out for in the Residual-versusfitted-values plot for this type of design is evidence of a curvilinear relationship – indicates nonadditivity between the blocks and treatments * * * * * * * * * * * * * * * * _________________________ systematic trend in residuals Statistical Modelling Chapter IV 46 Nonadditivity • Such nonadditivity may be transformable by take logs, square root or reciprocals of the data and analyzing these. • Another type of block-treatment interaction would occur where say a particular blend had a poison in it that affected only process B. – Then only the observation corresponding to that particular combination of blend and treatment would be affected. – It would be extremely low leading to an extreme residual. • Possible to test for transformable nonadditivity using Tukey's one-degree-of-freedom-for-nonadditivity, • Can be used with any design with an additive expectation model ( 2 terms), including regression (not CRD). • Involves detecting whether or not there is a curvilinear relationship between the residuals and fitted values. • For this, and subsequent designs, diagnostic checking should be based on the two plots and this one degree-offreedom. Statistical Modelling Chapter IV 47 An R function from dae, tukey.1df • tukey.1df(aov.obj, data, error.term="within") • where – aov.obj is an aov object or aovlist object created from a call to aov, – data is optional and is a data.frame containing the original response variable and factors used in the call to aov, and – error.term is the error.term whose residuals are to be tested for nonadditivity. Statistical Modelling Chapter IV 48 Statistical Modelling Chapter IV 4 2 res 0 -2 -4 80 85 90 95 fit -2 0 2 4 6 Normal Q-Q Plot -4 # # Diagnostic checking # res <- resid.errors(RCBDPen.aov) fit <- fitted.errors(RCBDPen.aov) data.frame(Blend,Flask,Treat,Yield,res,fit) Blend Flask Treat Yield res fit 1 1 1 A 89 -1.000000e+00 90 2 1 2 B 88 -3.000000e+00 91 3 1 3 C 97 2.000000e+00 95 4 1 4 D 94 2.000000e+00 92 5 2 1 A 84 3.000000e+00 81 6 2 2 B 77 -5.000000e+00 82 7 2 3 C 92 6.000000e+00 86 8 2 4 D 79 -4.000000e+00 83 9 3 1 A 81 -2.000000e+00 83 10 3 2 B 87 3.000000e+00 84 11 3 3 C 87 -1.000000e+00 88 12 3 4 D 85 -2.392617e-15 85 13 4 1 A 87 1.000000e+00 86 14 4 2 B 92 5.000000e+00 87 15 4 3 C 89 -2.000000e+00 91 16 4 4 D 84 -4.000000e+00 88 17 5 1 A 79 -1.000000e+00 80 18 5 2 B 81 -2.614662e-15 81 19 5 3 C 80 -5.000000e+00 85 20 5 4 D 88 6.000000e+00 82 > plot(fit, res, pch=16) > qqnorm(res, pch = 16) > qqline(res) Sample Quantiles > > > > > > 6 Example IV.1 Penicillin yield (continued) -2 -1 0 1 2 Theoretical Quantiles From plots, no serious departures from the assumptions apparent 49 Example IV.1 Penicillin yield (continued) > tukey.1df(RCBDPen.aov, RCBDPen.dat, + error.term="Blend:Flask") $Tukey.SS [1] 2.001082 S o u rc e df SSq M Sq B le n ds $Tukey.F [1] 0.0982679 $Tukey.p [1] 0.7597822 $Devn.SS [1] 223.9989 F la sk s[B le n ds] T re a tm en ts R e s id ua l N o n a d d itiv ity D ev iatio n T o ta l E [M S q] 4 264 6 6 .0 2 qB 15 3 296 70 2 3 .3 2 qT 12 226 1 8 .8 1 11 19 2 .0 224 560 2 .0 2 0 .4 F 3 .5 0 P ro b 0 .0 41 1 .2 4 0 .3 39 0 .1 0 0 .7 60 2 The hypotheses for the one-degree-of-freedom is: H0: Blends and Treatments are additive H1: Blends and Treatments are nonadditive H0 cannot be rejected — no evidence of transformable nonadditivity. Statistical Modelling Chapter IV 50 IV.E Treatment differences • For the purposes of the scientist the effect of the blocks are not of primary interest • Rather, attention is likely to be focused on treatment differences which can be investigated using the treatment means. • The discussion of multiple comparisons and submodels for the analysis of a CRD applies here also. Statistical Modelling Chapter IV 51 Example IV.1 Penicillin yield (continued) • The treatment means are: A 84 T re a tm e n t B C 85 89 D 86 • As the treatment levels are qualitative a multiple comparison procedure would be used to examine the differences. • However they are not significantly different so that we shall not apply such a procedure. Statistical Modelling Chapter IV 52 Example IV.1 Penicillin yield (continued) • Bar chart illustrates: Fitted values for Yield 80 Yield (%) 60 40 20 A B C D Treatment Statistical Modelling Chapter IV 53 IV.F Fixed versus random effects a) Another maximal model for the RCBD • Two alternative maximal models for RCBD: E Y = X B X T a n d v a r Y = I 2 E Y = X T a n d V = In B Ib J t 2 2 • Difference is that dropped from 2nd expectation model and covariance of observations from different units in the same block is B2 , rather than being zero. Statistical Modelling Chapter IV 54 Variance matrices for RCBD for b=3, t=4 B lo c k s fix e d 2 3 4 1 B lo ck II 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 I U n it 1 2 3 4 1 2 1 2 0 0 0 0 0 2 0 0 0 0 2 0 0 0 2 0 0 2 0 2 2 III 4 1 2 3 4 3 0 0 0 0 0 0 4 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 2 2 2 2 B lo c k s ra n d o m Statistical Modelling Chapter IV 3 4 1 B2 B2 B2 0 0 0 0 0 0 0 0 2 B 0 0 0 0 0 0 0 0 2 B 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 I U n it 1 III 4 1 2 3 4 0 2 1 2 Notice that, for Blocks random, the covariance between units from the same block is non-zero and is equal for all blocks. 2 B lo ck II 2 3 2 B 2 2 B 3 2 B 4 2 B 1 2 3 4 1 2 0 0 0 0 0 0 2 B 2 2 B 2 B 0 0 0 0 0 0 2 B 2 2 B 0 0 0 0 0 0 2 B 2 0 0 0 0 0 0 2 B 0 2 2 B 2 B 2 2 B 2 B B2 2 B 0 0 2 B 0 0 2 B 2 B 2 B B2 2 B 2 0 0 2 B B2 2 B 2 0 0 3 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 2 B 0 2 B2 2 B 0 0 0 B2 B2 B2 B2 B2 2 B 2 B2 B2 B2 B2 2 B2 B 2 B2 2 B 2 55 Fixed versus random factors • Definition IV.2: A factor will be designated as random if it is considered appropriate to use a probability distribution function to describe the distribution of effects associated with the population set of levels. • Definition IV.3: A factor will be designated as fixed if it is considered appropriate to have the effects associated with the population set of levels for the factor differ in an arbitrary manner, rather than being distributed according to a regularly-shaped probability distribution function. • As far as the model is concerned, – random effects are modelled using terms in the variation model – fixed effects are modelled using terms in the expectation model. • So when we are deciding whether a factor is random or fixed, we are choosing which mathematical model best describes the population distribution for the response variable. Statistical Modelling Chapter IV 56 Making the choice • Need to consider the population set of levels and how the set of response variable effects corresponding to this set of levels behaves. • To be classified as – random, we require that • the set of population levels is large in number and • the effects are “well-behaved” so that a regularly-shaped probability distribution function with some variance is appropriate for describing them. – fixed, the effects do not have the restrictions that are placed on random effects. • There might be a small or a large number of levels in the population and • their effects do not have to conform to a regularly-shaped probability distribution function because the model allows for arbitrary differences between them. • For example, effects from factor modelled in expectation model – If they display a systematic trend (perhaps involving polynomial submodels) – If factor for a small set of treatments that are to be compared. • In both cases, seems inappropriate to model the effects as being, say normally distributed, with some variance. – Pattern in the treatment effects may well be quite irregular — no interest in the form of this distribution. Statistical Modelling Chapter IV 57 Summary In practice – Random if i. large number of population levels and ii. random behaviour – Fixed if i. small or large number of population levels and ii. systematic behaviour Statistical Modelling Chapter IV 58 Units & Blocks — fixed or random? • Effects from individual units treated alike (for example, animals, plots of land, runs of a chemical reactor) are anticipated to arise randomly and the effects could well follow a probability distribution, say a normal distribution. – Hence appropriate to model them via a term in the variation model. • Must always model terms to which other terms have been randomized as random effects – because Treatments are randomized to Units[Block] in an RCBD, Units[Block] must be random. • What about Block effects in the RCBD? – It could be either depending on the anticipated effects of the blocks. • Suppose the blocks are groups of plots and are contiguous and a systematic trend is anticipated: – The distribution of block effects cannot be regarded as a random sample — they display a systematic pattern. – The factor Blocks should be designated as fixed. • However, suppose each block is in a separate location to other blocks and could be regarded as a random sample of all blocks obtained by dividing up the whole area under study. – It seems likely that the population block effects could be described by a probability distribution such as the normal distribution and the factor Blocks could be designated as random. • If there is some doubt, safest to not make the assumption of some probability distribution and to designate the factor as fixed. Statistical Modelling Chapter IV 59 Example IV.1 Penicillin yield (continued) • Should Blends be designated as fixed or random? – It was said at the outset that it was expected that there would be a lot of variability from blend to blend — that is why the RCBD was employed. – However, a systematic pattern in the average yields of the blends cannot be anticipated. – Rather, it seems reasonable that the effects of the population set of blends can be described by a probability distribution. – So Blends should be a random factor. • Analysis needs to be revised, using a call to aov in which Blends is not included outside the Error function. RCBDPen.aov <- aov(Yield ~ Treat + Error(Blend/Flask), RCBDPen.dat) • This will change the fitted values and Tukey's one-degreeof-freedom-for-nonadditivity. Statistical Modelling Chapter IV 60 b) Estimation and analysis of variance for Blocks random • Estimator of expected values under the model 2 2 E Y = X T a n d V = In B Ib J t are ˆ T = T = MTY ψ the same as for the model E Y = X T a n d V = In 2 • Block hypotheses become H0: B = 0 2 H 1: B 0 2 • That is, can B2 be dropped from V? • Also, as expectation model no longer involves the sum of two terms, Tukey’s one-degree-of-freedom for nonadditivity is no longer applicable. Statistical Modelling Chapter IV 61 ANOVA table for the RCBD • Form same irrespective of whether Blocks fixed or random E [M S q] S o u rc e df B lo cks b -1 U n its [B loc ks] b (t-1 ) B lo cks F ixe d 2 t B 2 qT 2 qB 2 qT t-1 R e s id ua l (b -1 )(t-1) T o ta l T re a tm en ts 2 B lo cks R a n d om 2 2 b t-1 • However, E[MSq]s differ — qB(Y) becomes t B2 • The F-statistic for testing this hypothesis is again the ratio of the Block and Residual mean squares. • Thus the test for both fixed and random block effects are the same —not always the case. Statistical Modelling Chapter IV 62 IV.G Generalized randomized complete block design • Difference between generalized and ordinary RCBDs is that in GRCBD each treatment occurs > 1 in a block. • As before we let b be no. of blocks and t no. of treatments. • In addition let – k denote no. of units per block and – g no. of times a treatment occurs in a block that is, k = t g and n = b k. • The R expressions for obtaining a layout for this design is given in Appendix B, Randomized layouts and sample size computations in R. • Advantages of this design – more df for the Residual compared to the standard RCBD. – Also, you can test for Block:Treatment interaction, as is discussed in chapter VI, Determining the analysis of variance table. • Disadvantage of the design – it has larger blocks – so it is likely that the units within a block will be less homogeneous than would be the case if a standard RCBD with smaller blocks were employed. Statistical Modelling Chapter IV 63 Analysis of GRCBD • The model for the generalized RCBD, without the Block:Treatment interaction, is virtually the same as that for the RCBD so that, in this case, the analyses of variance are similar. • Thus, depending on whether Blocks are fixed or random the maximal model, would be chosen from the two given for the RCBD. • For Blocks and Plots random, the ANOVA table is S o u rc e df SSq E [M S q] B lo cks b 1 Y Q B Y U n its [B loc ks] b k 1 Y Q B U Y T re a tm en ts t 1 Y Q T Y R e s id ua l b k 1 t 1 T o ta l Y Q B U Res B2 U k B2 B2 U q T Y B2 U bk 1 • R expressions same as for the standard RCBD. Statistical Modelling Chapter IV 64 Example IV.2 Design for a wheat experiment • For example, suppose 4 treatments are to be compared when applied to a new variety of wheat. • The researcher wants to employ a generalized RCBD with 12 plots in each of 2 blocks so that each treatment is replicated 3 times in each block. • Hence, b = 2, t = 4 and g = 3. so that k = 4 3 = 12 and n = 2 12 = 24. Layout for a generalized randomized complete block experiment Plots Blocks I II 1 C D 2 D A 3 D D 4 C C 5 B A 6 B D 7 A B 8 A A 9 D B 10 A B 11 B C 12 C C • The yield of wheat from each plot was measured. Statistical Modelling Chapter IV 65 Analysis with Blocks and Plots random • The model for the example: E Y = X T a n d V = I2 4 B I2 J1 2 = M U 1 2 B M B 2 2 2 2 • The corresponding ANOVA table: S o u rc e B lo cks P lo ts[B locks ] df T o ta l E [M S q] 1 Y Q B Y 22 Y Q B P Y T re a tm en ts R e s id ua l SSq 3 19 B2 P 1 2 B2 Y Q T Y Y Q B P Res B2 P q T Y B2 P 23 • Note that a RCDB b = 6, t = 4 and – would also have n = 6 4 = 24, – but would have (b 1)(t – 1) = 5 3 = 15 Residual df. Statistical Modelling Chapter IV 66 IV.I Exercises • Ex. IV.1-2 looks at quadratic forms for SSq • Ex. IV.3 requires a design of an RCBD and then analysis of data • EX. IV.4 asks for the complete analysis of an RCBD with a quantitative treatment factor • EX. IV.5 asks for the complete analysis of an RCBD with a qualitative treatment factor Statistical Modelling Chapter IV 67