Stat 512 Chapter 18 Split-Plot Designs and Repeated Measures Split Plot: When experiments have a factorial structure and one of the factors might be hard to implement (time consuming, expensive etc), one often uses a SPLIT plot design, where the randomization is done in two steps. Hence, here we have two types of units, the WHOLE unit and the subunits. The researcher must identify the size of the experimental units, along with the associated design and treatment structures in order to properly define the model and analyze the observed data. Example 1: Consider an situation where we are interested in 2 factors – A Nitrogen Treatment (3 levels) to soil and the variety of wheat (4 levels) and our response is the yield of wheat. A true Factorial Design would require we randomize the 12 treatment combinations on the 36 units available to us. However, there is a practical problem. The Nitrogen Fertilizer is applied using a tractor and a particular setting of the tractor allows a certain level of the fertilizer to be put on the soil. So lets say random assignment requires the following assignment for A1B1 A1B1 A1B1 A1B1 A1B1 A1B1 A1B1 A1B1 A1B1 1 A1B1 Stat 512 As you can see from the plot it is fairly difficult to assign the A1 as after each plot the tractor has to be removed to the next one getting A1B1. A better, more convenient choice would be to first select the tracts that get A1, A2 and A3 and do them in one go and then randomize the varieties on the assigned fertilizer. A1 A1 A1 A1 A3 A3 A3 A3 A3 A3 A3 A3 A2 A2 A2 A2 A1 A1 A1 A1 A2 A2 A2 A2 A3 A3 A3 A3 A2 A2 A2 A2 A1 A1 A1 A1 So the part that is marked with a box is considered a WHOLE plot and the 4 plots within it are called subplots. 2 Stat 512 Example 2: Consider a CRD with an oneway treatment structure. In this case, suppose the treatment structure consisted of three (3) varieties of wheat ( V1 , V2 and V3 ), planted on four (12) randomly selected FARMS (large experimental units). The response for this experiment might be wheat yield in bushels per acre. We randomize the varieties to the farms. This design layout might appear as follows: V1 V2 V1 V3 V1 V3 V3 V1 V2 V2 V3 V2 However, the researcher might also be interested in the effects of two different fertilizers ( F1 and F2 ) on yield. The CRD presented earlier can be modified by splitting each farm (exp. unit) in half and then randomly assigning the fertilizers: one fertilizer to each half experimental unit (sub-unit). This modified design might appear as follows: V1F1 V1F2 V4F2 V4F1 V2F1 V2F2 V2F2 V2F1 V3F1 V3F2 V4F2 V4F1 V1F1 V1F2 V3F2 V3F1 V3F1 V3F2 V4F1 V4F2 V1F2 V1F2 V2F2 V2F1 In this experiment there are two different sizes of experimental units: the large units are the farms; and the small units (sub-units) are the half-farms. 3 Stat 512 These experimental design consists of two components: 1) Whole Plot Design and Treatment Structure; 2) Subplot Design and Treatment Structure. Whole Plot Design and Treatment Structure: A CRD (exp. unit = farm) with an oneway treatment structure (Wheat Variety). Subplot Design and Treatment Structure: A RCBD (exp. unit = half-farm) with an oneway treatment structure (Fertilizer). Effects Model Yijk = + wholeplot_TRTi + wholeplot_TRT_error i(j) + subplot_TRT k + subplot_TRT*wholeplot_TRTik +subplot_error_TRT(ij)k i = 1, 2, , a j = 1, 2, , r k = 1, 2, , b Expected Mean Squares EMS(Wholeplot) EMS(Whole Plot Error) EMS(subplot) EMS(wholeplot*subplot) EMS(Subplot Error) = = = = = b 2 e 2 r b V a r i e t y 2 W h o l e P l o t 2 e2 bW holeP lot 2 2 e raFertilizer 2 2 r e F e rtiliz e r* V a rie ty e2 Here wholeplot=wheat variety Subplot=fertilizer 4 Stat 512 Anova Table Source W SS MS F0 SS Wheat df a-1 SSWheat a 1 M SWheat M SWholePlot Error WP Error SSWholePlot Error (r-1)a SSWholePlot Error r1a F SSFertilizer b-1 SSFertilizer b1 MSFertilizer MSError F*W SSFertilizer*Wheat (a-1)(b-1) SSFertilizer*Wheat a1b1 M SFertilizer*Wheat M SError SP Error SS Error a(r-1)(b-1) S S E rro r ar 1 b 1 5 Stat 512 Example 3: In an experiment on the preparation of chocolate cakes, conducted at Iowa State College, 3 Recipes for preparing the batter were compared. Recipes I and II differed in that the chocolate was added at 40o C and 60o C , respectively, while recipe III contained extra sugar. In addition, 6 different baking Temperatures were tested: these ranged in 10o C steps from 175o to 225o . Each time that a mix was made by a recipe, enough batter was prepared for 6 cakes, each of which was baked at a different temperature. In this way, 5 replicates of each recipe were constructed. The data from this experiment are shown in the following table: Breaking Angle for Cakes (Degrees, Cochran and Cox, 1957). Temperature Recipe Rep 175 185 195 205 215 225 1 42 46 47 39 53 42 2 47 29 35 47 57 45 1 3 32 32 37 43 45 45 4 26 32 37 43 39 26 5 28 30 31 37 41 47 2 1 2 3 4 5 39 35 34 25 31 46 46 30 26 30 51 47 42 28 29 49 39 35 46 35 55 52 42 37 40 42 61 35 37 36 1 46 44 45 46 48 63 2 43 43 43 46 47 58 3 3 33 24 40 37 41 38 4 38 41 38 30 36 35 5 21 25 31 35 33 23 ________________________________________________________ Breaking Angle for Cakes (Cochran and Cox, 1957). 6 Stat 512 Model: Yijk = + Recipei + batch(recipe) i(j) + temperaturek + recipe*temperatureik +e (ij)k i = 1, 2, , 3 j = 1, 2, , 5 k = 1, 2, , 6 SAS results: The GLM Procedure Source Type III Expected Mean Square Recipe Var(Error) + 6 Var(Batch(Recipe)) + Q(Recipe,Recipe*Temperature) Batch(Recipe) Var(Error) + 6 Var(Batch(Recipe)) Temperature Var(Error) + Q(Temperature,Recipe*Temperature) Recipe*Temperature Var(Error) + Q(Recipe*Temperature) The GLM Procedure Tests of Hypotheses for Mixed Model Analysis of Variance Dependent Variable: Angle Source DF * Recipe Error Type III SS Mean Square F Value Pr > F 2 1.800000 0.900000 12 3457.466667 288.122222 0.00 0.9969 Error: MS(Batch(Recipe)) * This test assumes one or more other fixed effects are zero. Source Batch(Recipe) * Temperature DF Type III SS Mean Square F Value Pr > F 12 3457.466667 288.122222 8.68 <.0001 5 1149.166667 229.833333 6.92 <.0001 183.533333 18.353333 0.55 0.8452 60 1992.133333 33.202222 Recipe*Temperature 10 Error: MS(Error) * This test assumes one or more other fixed effects are zero. 7 Stat 512 Least Squares Means for effect Temperature Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: Angle i/j 1 1 2 3 4 5 6 0.8996 0.0580 0.0077 <.0001 0.0007 2 0.8996 0.0759 0.0108 <.0001 0.0010 3 0.0580 0.0759 0.4133 0.0092 0.1047 4 0.0077 0.0108 0.4133 0.0664 0.4133 5 <.0001 <.0001 0.0092 0.0664 0.2999 6 0.0007 0.0010 0.1047 0.4133 0.2999 Note: To ensure overall protection level, only probabilities associated with pre-planned comparisons should be used. 8 Stat 512 Split-Plot vs. Split-Block Designs Some authors distinguish between the "Split-block" and the "Splitplot" designs. The distinction is at the whole-plot level of the design: Whole-Plot Design The whole-plot design can be a CRD, an RCBD or a Latin Square design. The whole-plot treatment structure can be one-way, two-way, etc. Sub-Plot Design The sub-plot design is always an RCBD. The sub-plot treatment structure can be one-way, twoway, etc. In addition, sub-plot designs can be split multiple times to produce designs which are split-split plot, split-split-split plot, etc. 9 Stat 512 Example A researcher is interested in comparing the yield among four (4) varieties of Oats which are planted in combination with four (4) seed treatments. The design chosen consists of four (4) field strips, each of which is divided into four (4) equal sized units. The four oat varieties are randomly assigned to the four plots within each field strip. Following the assignment of each variety of oats, each experimental unit containing a variety of oat, was subdivided into four sub-units. The four seed treatments were randomly assigned to the four subunits. This structure can be visualized in the following field strip: Block 1 2 3 Seed Treatment 1 2 3 4 Variety Oat 1 Oat 2 Oat 3 Oat 4 42.9 53.3 62.3 75.4 53.8 57.6 63.4 70.3 49.5 59.8 64.5 68.8 44.4 64.1 63.6 71.6 1 2 3 4 41.6 58.5 53.8 41.8 69.6 69.6 65.8 57.4 58.5 50.4 46.1 56.1 65.6 67.3 65.3 69.4 1 2 3 4 28.9 43.9 40.7 28.3 45.4 42.4 41.4 44.1 44.6 45.0 62.6 52.7 54.0 57.6 45.6 56.6 1 35.1 35.1 50.3 52.7 4 2 51.9 51.9 46.7 58.5 3 45.4 45.4 50.3 51.0 4 51.6 51.6 51.8 47.4 ________________________________________________ 10 Stat 512 Split Block Design - Effects Model Y = μ + B l o c k + O a t + W h o l e P l o t E r r o r i j k i j i j S e e d S e e d * O a t S u b p l o t E r r o r k j k i j k W h o l e P l o t E r r o r B l o c k * O a t i j= i j S u b p l o t E r r o r = B l o c k * O a t * S e e d B l o c k * S e e d i j k i k i j k i = 1, 2, , r j = 1, 2, , a k = 1, 2, , b Anova Table Source Block SS SS Block df r-1 MS F0 SS Block r 1 SSOat a 1 Oat SSOat a-1 M SOat M SWholePlot Error WP Error SSWholePlot Error (r-1)(a-1) SSWholePlot Error r1a1 Seed SSSeed b-1 SS Seed b 1 MSSeed MS Error Seed*Oat SSSeed*Oat (a-1)(b-1) SSSeed*Oat a1b1 MSSeed*Oat MSError SP Error SS Error a(r-1)(b-1) S S E rro r ar 1 b 1 WP Error = Whole Plot Error; SP Error = Subplot Error Source Model Error Corrected Total Source block oat block*oat seed oat*seed DF 27 36 63 Sum of Squares 7066.191875 731.202500 7797.394375 Mean Square 261.710810 20.311181 F Value 12.89 R-Square Coeff Var Root MSE yield Mean 0.906225 8.534077 4.506793 52.80938 Pr > F <.0001 DF Type I SS Mean Square F Value Pr > F 3 3 9 3 9 2842.873125 2848.021875 618.294375 170.536875 586.465625 947.624375 949.340625 68.699375 56.845625 65.162847 46.66 46.74 3.38 2.80 3.21 <.0001 <.0001 0.0042 0.0539 0.0059 11 Stat 512 Source block oat block*oat seed oat*seed DF Type III SS Mean Square F Value Pr > F 3 3 9 3 9 2842.873125 2848.021875 618.294375 170.536875 586.465625 947.624375 949.340625 68.699375 56.845625 65.162847 46.66 46.74 3.38 2.80 3.21 <.0001 <.0001 0.0042 0.0539 0.0059 Tests of Hypotheses Using the Type III MS for block*oat as an Error Term Source oat DF 3 Type III SS 2848.021875 Mean Square 949.340625 12 F Value 13.82 Pr > F 0.0010 Stat 512 The formulae for the split plot with main plots organized in LS are similar and are given in Table above for RCBD. These different designs will not affect the last three rows of the previous table. The three upper lines are: CRD A Error A Total Factor B AxB Error B Total RCBD Latin Square Rows r-1 Columns a-1 A (r-1)(a-1) Error A ra-1 Total b-1 Factor B (a-1)(b-1) A x B a(r-1)(b-1) Error B rab-1 Total Blocks a-1 A a(r-1) Error A ra-1 Total b-1 Factor B (a-1)(b-1) A x B a(r-1)(b-1) Error B rab-1 Total a-1 a-1 a-1 (a-1)(a-2) ra-1 b-1 (a-1)(b-1) a(r-1)(b-1) rab-1 Error B (B*Block+A*B*Block) df = (b-1)*(r-1) + (b-1)*(r-1)*(a-1)= (b-1)*(r-1)*[1+(a-1)]= a*(b-1)*(r-1) For CRD the Model Statement is: Y = A Rep(A) B Random Rep(A) A*B For RCBD the Model Statement is: Y = A Block Block*A B Random Block Block*A A*B For LSD the Model Statement is: Y = A Row Column Error_A B A*B Random Row Column Error_A 13 Stat 512 Split-Split-Plot Design Structure By beginning with a split-block or split-plot design, a split-split-block or split-split-plot design can be constructed. This construction consists of a second split (randomization restriction) at the sub-plot level. Example: A meat scientist wants to study the effect of temperature (T) with three levels, types of packaging (P) with two levels, lighting intensity (I) with four levels on the color of meat stored in a meat cooler of seven days. Six coolers are available for the experiment, and the three temperatures ( 34o F , 40o F , and 46o F ) are assigned at random to two coolers. Each cooler is partitioned into 4 columns. Because light intensities are regulated by distance, all partitions in the column are assigned, at random, the same light intensity (100 watts, 150 watts, 200 watts, and 300 watts). Each column is then partitioned into two areas in which the two types of packaging are randomly assigned. 14 Stat 512 The partial ANOVA table for the above design is as follows: Source of Variation df Cooler Analysis (CRD) Mean() Ti Error(Cooler) = C(T)(i)j Whole Plot Total 1 2 3 6 Intensity Analysis (RCBD) Ik I*Tik Error(Column) = C(T)*I(i)jk Sub-Plot Total 3 6 9 18 Packaging Analysis (RCBD) Pl P*Til P*Ikl P*T*Iikl Error(Partition) = C(T)*P(i)jl + C(T)*I*P(i)jkl Sub-Sub-Plot Total 1 2 3 6 12 24 15 Stat 512 Split Plot Design - Standard Errors and LSD CRD - Whole Plot Factor M S h o l e P l o tE r r o r S E Y W i . . r b 2 M S W h o l e P l o t E r r o r S E Y Y i . . i . . r b 2 M S W h o l e P l o t E r r o r L S D t , d f r b W h o l e P l o t E r r o r 2 RCBD - Sub-Plot Factor M S b P l o tE r r o r S E Y Sur . . k a 2 M S S u b P l o t E r r o r S E Y Y . . k . . k r a L S D t 2 M S S u b P l o t E r r o r r a , d f S u b P l o t E r r o r 2 16 Stat 512 Significant Interaction in a Split-Plot Experiment Suppose that the Interaction between the whole plot treatment and the sub-plot treatment is significant. Analysis must be based on the twoway cell means, not the marginal means. Comparing two sub-plot treatments at the same level of the whole plot treatment: L S D t 2 M S S u b P l o t E r r o r r a , d f S u b P l o t E r r o r 2 Comparing two whole plot treatments at the same level (or different levels) of the sub-plot treatment: M S b 1 M S W h o l e P l o t E r r o r S u b P l o t E r r o r L S D t 2 * , d f 2 Where r b b 1 M S S u b P l o t E r r o r t M S t W h o l e P l o t E r r o r , r 1 a 2 t * , d f 2 , r 1 a b 1 2 M S b 1 M S W h o l e P l o t E r r o r S u b P l o t E r r o r 17 Stat 512 Strip-Plot Designs Consider an agricultural field trial involving "t" varieties of wheat (Factor A) and "s" types of fertilizers (Factor B). Both seeding and fertilizing are most easily performed in strips. By placing the wheat varieties in rows and the fertilizers in columns a "strip plot" experimental design is produced. Before inferences can be made for either of the factors, the design must be replicated (Rep), say n 2 times. The linear model for the strip plot design is Yijk = + Repk + Ai + Rep*Aik (Row) + Bj + Rep*Bjk (Column) + A*Bij + eijk (Interation) Y = μ+ R e p ijk k + A i + R e p * A R o w ) ik ( + B j + R e p * B C o lu m n ) jk ( + A B e i* ij + ijk ( I n te r a tio n ) i = 1, 2, , s j = 1, 2, , t k = 1, 2, , n Note: eijk = Rep*A*Bijk Sour df EMS ce 2 e2 ns tR Rep r - 1 2 2 2 n t n r t A s-1 e R * A A 2 t R*A (r-1)*(s-1) e2nr RA * 2 2 2 t n t B t-1 R A B R * B B 2 R2AB tRB R*B (r-1)*(t-1) * 2 2 A*B (s-1)*(t-1) RAB nAB * 2 RAB R*A (r-1)*(s*B 1)*(t-1) 18 Stat 512 Split Plot Advantages 1. It permits the efficient use of some factors which require large experimental units in combination with other factors which require small experimental units. 2. It provides increased precision in the comparison of some factors. 3. It permits the introduction of new treatments into an experiment which is already in progress. Disadvantages 1. Statistical analysis is complicated because different comparisons have different error variances. 2. Low precision on the whole plots can result in large differences being nonsignificant, while small differences on the subplots may be statistically significant even though they are of no practical significance. 19 Stat 512 Uses of Split-plot designs 1. Split-plot designs, and a variation, the split-block, are frequently used for factorial experiments in which the nature of the experimental material or the operations involved make it difficult to handle all factor combinations in the same manner. It may be used when the treatments associated with the levels of one or more of the factors require larger amounts of experimental material in an experimental unit than do treatments for other factors. 2. These designs are also used when the investigator wishes to increase precision in estimating certain effects and is willing to sacrifice precision in estimating certain others. The design usually sacrifices precision in estimating the average effects of the treatments assigned to main plots. It often improves the precision for comparing the average effects of treatments assigned to subplots and, when interactions exist, for comparing the effects of subplot treatments for a given main plot treatment. This arises from the fact that experimental error for main plots is usually larger than the experimental error used to compare subplot treatments. Usually, the error term for subplot treatments is smaller than would be obtained if all treatment combinations were arranged in a randomized complete block design. 3. The design may be used when an additional factor is to be incorporated in an experiment to increase its scope. For example, suppose that the major purpose of an experiment is to compare the effects of several seed protectants. To increase the scope of the experiment several varieties are used as main plots and the seed protectants are used as subplots. 20 Stat 512 Remark: The basic split-plot design involves assigning the treatments of one factor to main plots arranged in a CRD, RCBD or a Latin-Square design and then assigning the second factor to subplots within each main plot. Note that randomization is a two-stage one. First, levels of factor A are randomized over the main plots and then levels of factor B are randomized over the subplots. Each main plot may be considered as a block as far as factor B is concerned but only as an incomplete block as far as the full set of treatments is concerned because not every subplot has the same chance of getting every treatment combination. This restriction in randomization results in the presence of two error terms, one for main plots and one for subplots. Ordinarily the error term for the main plots is larger than it would be in a complete design since the main plots are larger and further apart, while the subplot error is smaller than it would be in a complete design. Since the interactions are compared using the smaller subplot error, the precision in estimating interactions is usually increased. A classical example of a split plot is an irrigation experiment where irrigation levels are applied to large areas, and factors like varieties and fertilizers are assigned to smaller areas within a particular irrigation treatment. The proper analysis of a split-plot design recognizes that treatments applied to main plots are subject to larger experimental error than those applied to subplots; hence, different mean squares are used as denominators for the corresponding F ratios. This concept is discussed in terms of expected mean squares in this topic. 21 Stat 512 Generally, the error associated with the subplots is smaller than that for the whole plots. This is because 1. Small units within the large units tend to be positively correlated. This has the effect of reducing experimental error. 2. Error degrees of freedom for the whole plots are usually less than those for the subplots. This has the effect of increasing the whole-plot error relative to that of the subplots. In summary, the factors that require smaller amount of experimental material, that are of major importance, that are expected to exhibit smaller differences, or for which greater precision is desired are assigned to the subplots. The distinction between the factor split-plot design and the standard two-factor experiments lies in the randomization. In a split-plot design, there are two stages to the randomization process; first levels of factor A are randomized to the wholeplots within each block, and then levels of factor B are randomized to the subplot units within each whole plot of every block. In contrast, for a two-factor experiment laid off in a randomized block design, the randomization is a one-step procedure; treatments (factor level combinations of the two factors) are randomized to the experimental units in each block. Note: whole plot testing similar if block random or fixed. In subplot, if block fixed, all interactions with block are pooled into error. If it is random, this may or may not be done. 22