Two-factor split-plot designs - Statistics and Actuarial Science

advertisement

Chapter 11

Two-factor split-plot designs

Contents

11.1 Introduction

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 629

11.2 The three basic structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633

11.2.1 Treatment Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633

11.2.2 Experimental Unit structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 633

11.2.3 Randomization structure

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 633

11.3 Data and labeling experimental units. . . . . . . . . . . . . . . . . . . . . . . . . . 634

11.4 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635

11.5 Example - Tensile strength of paper - main plots in CRD . . . . . . . . . . . . . . 636

11.5.1 Why is this a split-plot design? . . . . . . . . . . . . . . . . . . . . . . . . . 637

11.5.2 The big three

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 638

11.5.3 Preliminary analysis and checking assumptions . . . . . . . . . . . . . . . . . 638

11.5.4 Statistical model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639

11.5.5 Results

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 642

11.5.6 CAUTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644

11.6 Example - Biomass of trees - main plots in an RCB . . . . . . . . . . . . . . . . . 644

11.6.1 Data entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645

11.6.2 Preliminary analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646

11.6.3 The statistical model

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646

11.6.4 Statistical model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647

11.6.5 A simpler model to a subset of the data . . . . . . . . . . . . . . . . . . . . . 650

11.7 Example - Tenderness of meat - main plots in an RCB . . . . . . . . . . . . . . . . 650

11.7.1 Data Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651

11.7.2 Preliminary Data Analysis

. . . . . . . . . . . . . . . . . . . . . . . . . . . 652

11.7.3 Design considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654

11.7.4 Statistical Model

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654

11.7.5 Suppose that individual measurements were available from each judge? . . . . 659

11.8 Example - Fungi degrading organic solvents - a split-plot in time

. . . . . . . . . 659

11.8.1 Preliminary analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 661

11.8.2 Model development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662

11.8.3 Results

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663

11.9 Example - Home range - an unbalanced split-site plot in time

. . . . . . . . . . . 665

11.9.1 Preliminary analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 666

11.9.2 Model building . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 668

628

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

11.9.3 Results

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 670

11.10Example - Floral scents and learning - pseudo-replication . . . . . . . . . . . . . . 674

11.10.1 Preliminary steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674

11.10.2 Building the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 676

11.10.3 Results

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 676

11.10.4 Further work

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 679

11.11Example - Pheromone effects upon wild type and anarchist colonies of bee . . . . 679

11.11.1 Analysis of the mean scores . . . . . . . . . . . . . . . . . . . . . . . . . . . 681

11.11.2 Analysis of raw scores

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685

11.11.3 Analysis of ovary scores as a discrete response . . . . . . . . . . . . . . . . . 685

11.12Repeated Measure Designs analyzed as a Split-Plot Analysis . . . . . . . . . . . . 685

11.13Example - Holding your breath at different water temperatures - BPK

. . . . . . 686

11.13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 686

11.13.2 Standard split-plot analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 687

11.13.3 Adjusting for body size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 699

11.13.4 Fitting a regression to temperature . . . . . . . . . . . . . . . . . . . . . . . 701

11.13.5 Planning for future studies

. . . . . . . . . . . . . . . . . . . . . . . . . . . 703

11.14Example - Systolic blood pressure before presyncope - BPK

. . . . . . . . . . . . 710

11.14.1 Experimental protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 710

11.14.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712

11.14.3 Power and sample size determination . . . . . . . . . . . . . . . . . . . . . . 722

11.15Final notes

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 722

11.16Frequently Asked Questions (FAQ) . . . . . . . . . . . . . . . . . . . . . . . . . . 724

11.16.1 Difference between CRD, RCB, and split plot . . . . . . . . . . . . . . . . . 724

11.16.2 Difference between CRD, RCB, and split plot . . . . . . . . . . . . . . . . . 727

The suggested citation for this chapter of notes is:

Schwarz, C. J. (2015).

Two-factor split-plot designs .

In Course Notes for Beginning and Intermediate Statistics .

Available at http://www.stat.sfu.ca/~cschwarz/CourseNotes . Retrieved

2015-08-20.

11.1

Introduction

Two-factor designs can be run in many different ways. The raw data does not, by itself, provide sufficient information to decide on the experimental design used to collect the data. Consider again (see the introduction to experimental designs) an experiment to investigate the influence of lighting level (High

or Low) and Moisture Level (Wet or Dry) upon the growth of plants grown in pots.

1

Four possible experimental designs are shown below:

1

This is a popular past-time in British Columbia!

c 2015 Carl James Schwarz

629

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

In each case, plants are potted, the treatment applied, and the resulting growth of the plant’s leaves

(say final total biomass) is measured.

In earlier chapters, you’ve seen how to analyze Design A. This is a two-factor CRD. The treatments

(the combination of lighting level and moisture) are randomly assigned to plants. The experimental unit is the plant, i.e. the treatments operate at a fine scale and affect one plant at time. The observational unit is also the plant, i.e. one measurement is taken from each plant. This would be the default analysis option for most statistical packages. The model for this experiment is

Y = LightLevel Moisture LightLevel*Moisture

Similarly, Design B is a two-factor RCB design, where the house is a block. Within each house, each treatment occurs once, and every treatment occurs in every block. The assignment of treatments to plants within a block (house) is completely randomized. The experimental unit is again the plant and the c 2015 Carl James Schwarz

630

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS observational unit is again the plant. The model for this experiment would be:

Y = LightLevel Moisture LightLevel*Moisture House where the House term repreents the blocking variable. Because each block is complete (i.e. every treatment occurs in every block), the blocking term can be either a fixed or a random effect – the resulting

hypothesis tests for the two-factors would be identical in both cases.

2

Design C is an example of a design with pseudo-replication and won’t be discussed further. Because there are no real replicates, inference is greatly limited.

Now consider Design D. In Design D, four growth chambers are obtained. Two pots are placed in each growth chamber. The lighting level (H or L) is assigned to a growth chamber. The lighting level assigned to a particular growth chamber simultaneously affects the two pots within the growth chamber.

The moisture level is assigned to the individual pot. The observational unit is the single plant in each pot. At the growth chamber level, the design looks like a single factor CRD with lighting level randomly assigned to growth chamber. At the pot level, the design looks like a blocked design with moisture level randomly assigned to the two pots within each growth chamber (block). Here the growth chamber serves two purposes – it is the experimental unit for the lighting level factor and the block for moisture level factor.

Design D is an example of a Split-Plot Design.

Although this experimental design is used in many disciplines, it’s genesis (and hence its name) is from agricultural settings. Suppose that you wish to test several different varieties of a field crop under a number of different field preparation methods. The field preparations may require the use of large machinery, and by necessity they are performed on large plots. For example, it is quite impossible to use a standard sized plow on a 1 m 2 plot, and the smallest unit that could be used might be 100 m 2 . By contrast, it is feasible to use smaller plots for planting the different varieties. Hence, the larger plot is split into small sub-plots for planting the varieties.

A split-plot design lets you use smaller plots for the varieties, and larger plots for the field preparations. Large plots are laid out for the field preparations, and assigned to preparation treatments according to a completely randomized, randomized block, or other design. Each of these plots is then split into sub-plots, one for each of the varieties. Each variety is then randomly assigned to a sub-plot within each main plot. The assignment is performed randomly within each main plot, and independently in different main plots. That is, the main plots are treated like blocks as far as the varieties are concerned.

Below is a pictorial representation of a split-plot design with a completely randomized design for the main-plot treatments. Again, look at it from two perspectives. The main-plot treatments ( a1 and a2 ) are assigned to main-plot experimental units using a simple, completely randomized design. The sub-plot treatments ( b1 , b2 , and b3 ) as assigned to the sub-plots in a similar fashion as in a randomized block

2

Note that the se for the marginal means will be different under fixed or random blocks, but these are usually not of interest.

c 2015 Carl James Schwarz

631

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS design. Combining the two together produces the split-plot design.

a1 a2

+----+ +----+

| | | |

|b1 | |b2 |

+----+ +----+

| | | |

|b3 | |b3 |

+----+ +----+

| | | |

|b2 | |b1 |

+----+ +----+ a2

+----+

| |

|b3 |

+----+

| |

|b1 |

+----+

| |

|b2 |

+----+ a1

+----+

| |

|b2 |

+----+

| |

|b1 |

+----+

| |

|b3 |

+----+

Recognizing the two types of designs that are combined is the key to analyzing these designs. This combination of designs results in TWO different sizes of experimental units. The main plot factor

(A) is randomly assigned to large (main) plots. Consequently, variation at the main-plot level is what limits detection of effects for factors that operate on the main plots. The sub-plot factor (B) is randomly assigned to smaller (split) plots within each main plot. Consequently, both the blocking by main plot and the smaller split-plot experimental unit must be considered.

Most standard statistical packages can be used to produce the correct analysis for these design but the correct model must be specified so that the variation from two different sizes of experimental units can be separated. Some packages will botch this computation.

In my experience in statistical consulting, this design is likely the most common design to be analyzed incorrectly.

It is very simple to overlook the two sizes of experimental units and try to analyze the experiment as a simple two-factor CRD or RCB. This will give incorrect results ( p -values are typically too small, se are too small, etc.).

If split-plot designs are “difficult” to analyze, why do them?

There are two common reasons. First, in some cases, certain factors can only be applied to larger experimental units. For example, using a airplane to spray herbicides or pesticides over large areas of land.

Smaller plots within the larger plots can be used to examine different varieties. Another common example are greenhouse studies where the entire greenhouse is used to control one factor (e.g. temperature) while finer control is maintained at areas within a greenhouse (e.g. water level).

Second, a common usage of split-plot designs occurs when Time is a second factor. In many experiments, the first factor is randomized to individual experimental units, and then these same units are measured at several time points. This is called a split-plot-in-time as the main plots are the experimental units and the sub-plots are the time points within the units. Note that because time cannot be randomized, these types of designs are more properly analyzed as repeated measure designs, but they are often analyzed as split-plot designs.

c 2015 Carl James Schwarz

632

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

11.2

The three basic structures

11.2.1

Treatment Structure

The treatment structure for a split-plot design is the same as for other two-factor designs, i.e., it is usually a factorial structure where all treatment combinations occur in the design.

The model for a factorial treatment structure will have terms corresponding to the main effects and interactions of the factors, i.e., terms for A , B , and A*B .

11.2.2

Experimental Unit structure

The key feature of the split-plot design is that there are two sizes of experimental units. The main plots can be arranged either in a completely randomized design or in blocks as part of a randomized completed block design.

The sub-plots are always smaller portions of the main plots.

The model used for the analysis will have terms corresponding to the two sizes of experimental units. If the main plots are arranged in blocks, a term corresponding to this blocking will also occur in the model.

This is where model building for many packages becomes confusing. Most packages (including

JMP ) make no distinction between terms in the model, i.e., the package doesn’t know which term represents treatments and which term represents blocks and which term represents experimental units. Historically, the computations for ANOVA involved difficult hand computations for which sets of rules were developed to speed computation. Consequently, the syntax often used for experimental units in split-plot designs evolved so that the rule for hand computation would be appropriate – in this era of computer computations, these rules are no longer necessary, but the same syntax for experimental unit effects is still used. As you will see in a later section, this leads to much confusion!

I recommend that you number each experimental unit using a separate, alphanumeric label!

This will be illustrated in the examples that follow and explored in more detail in a later section. If you use numerical codes for factor levels, you need to be careful that the package understands that these levels has a nominal scale and doesn’t treat them as values in a regression context.

Generally each experimental unit will have a term in the model and labeled as a random effect.

11.2.3

Randomization structure

There are two separate randomizations done in a split-plot design.

At the upper level, the main plot treatments are randomly assigned to main plot units. This may be done as a CRD or as an RCB. Usually, an equal number of replicates of each main-plot treatment is done, but unbalance at the main plot level does not cause much problems.

At the lower level, the sub-plot treatments are randomly assigned within each main-plot and the randomization is done independently for each main plot. Usually, only one occurrence of the sub-plot c 2015 Carl James Schwarz

633

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS treatment appears in each main plot, but this can be relaxed at the price of increasing the complexity of the analysis.

In the case of split-plots-in-time, the factor Time cannot be randomized. This can lead to the following problems:

• any other effect that occurs in synchrony with time is total confounded with the time effect. Hence the time effect could be due to these other factors. [Recall in a completely randomized design, the effects of all other factors will be randomized among the treatment groups.]

• there could be carry over effects. Again, a complete randomization would ensure that such effects are randomized among the treatment groups and the effects would be roughly equal in all groups.

• observations close in time may be more related than observations far apart in time. This is related to the carry over effects problem above. In a completely randomized design, the randomization procedure forces such covariances to be equal across the levels of the factor.

A repeated measures design or a multivariate approach is more flexible and “solves” some of these problems. As well, some packages (e.g.

SAS ) allow more complex models to a split-plot design that alleviate some of the above problems. For example, a very simple model for autocorrelation is the

AR(1) process where the correlation between successive responses 1 time unit apart is ρ ; the correlation between responses 2 time units apart is ρ 2 ; the correlation between responses 3 time units apart is ρ 3 , etc.

11.3

Data and labeling experimental units.

The data from a split design will need several variables. Consider Design D seen in the introduction. The experiment will need a variable to label the growth chamber, the lighting level, the pot within a growth

chamber

3

, and finally the response variable. For example, here is a “traditional” set of data that might

come from Design D:

L

L

L

H

H

H

L

Light Growth Moisture

Level Chamber Pot Level

H 1 1 W

1

2

2

1

1

2

2

2

1

2

1

2

1

2

D

W

D

D

W

W

D

Growth

13.3

13.4

13.3

12.9

12.2

11.9

12.3

11.8

This “traditional” data presentation has a number of confusing aspects. There are 4 separate growth chambers, but they are labeled using 2 replicates of the numbers 1 and 2. Clearly, growth chamber 1 of

L is different than growth chamber 1 of H. Similarly, there are 8 pots, but four of them are labeled as “1” and four of them are labeled as “2”. Each pot labeled “1” is a different pot.

3

This variable is redundant because once you know the growth chamber and moisture level, you have identified the individual plot. However, it is good practice to include a variable for the sub-plot experimental unit c 2015 Carl James Schwarz

634

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

Growth chamber and pot are technically called a nested variables because of the reuse of the numbers.

The use of simple numbers for growth chambers and pots is also bad form. These are simple labels for these units, and so are nominal scaled. Many computer packages may get confused when presented with simple numerical labels for nominal data and may attempt an analysis that was not intended.

In general, this format is bad idea. If another colleague looked at the data, it is difficult to know if there are 4 growth chambers or two growth chambers as the data would look identical.

A better way to structure the data is to use distinct alphanumeric values for each distinct experimental or observational unit in the study (as was done in the sub-sampling chapter). Here is a better way to represent the data from this experiment:

L

L

L

H

H

L

Light Growth Moisture

Level Chamber Pot Level

H

H g1 g1 p1 p2

W

D g4 g4 g2 g2 g3 g3 p7 p8 p2

W

D

D p3 W p5 W p6 D

Growth

13.3

13.4

13.3

12.9

12.2

11.9

12.3

11.8

Now it is clear that there are four separate growth chambers and eight separate pots. Alphanumeric codes are used for all nominal or ordinal variables.

This data representations is highly recommended . It makes the specification of the model much simpler and less prone to errors. Unfortunately, the traditional specification is still widely used, so in the examples in this chapter, models for both specifications will be presented.

11.4

Assumptions

The same assumptions are made as for most other experimental designs.

• Analysis matches design . Make sure the analysis matches the design.

• No outliers . Do side-by-side dot plots and other plots.

• Equal variances in each treatment group.

Compute the sample standard deviations for each treatment group to see if these are approximately equal. Plot the treatment standard deviations vs.

the treatment means to see if there is a relationship between the standard deviation and the mean.

Consider a transformation (e.g. a log transform) if variances tend to increase with the mean.

4

• Normality of errors (residuals).

This is a tricky assumption to assess because there are actually two random sources of variation coming from two different normal distributions corresponding

4

The actual assumption is that the variation of the main plot experimental units is equal for all whole plots, and that the variation of the sub-plot experimental units within each main plot is also equal.

c 2015 Carl James Schwarz

635

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS to the main-plot and sub-plot experimental units. As before, you have low power to detect nonnormality in small samples and in large samples it is not important. Best left to experts.

5

• Independent errors (residuals) . Again, this is difficult to assess because of the two levels of experimental units. Your best protection is randomize, randomize, randomize, randomize....

11.5

Example - Tensile strength of paper - main plots in CRD

A paper manufacturer is interested in the effect of three different pulp preparation methods and four different cooking temperatures for the pulp on the tensile strength of the resulting paper.

The equipment that is used for the pulp preparation methods only works on large amounts of pulp.

The equipment that is used to cook the pulp can work on smaller batches of material.

On any one day, the experiment is conducted as follows. A batch of pulp is produced by one of the three methods under study. The method of pulp preparation is randomized among 9 days available for the experiment.

Within a day, a batch is divided into four sub-batches, and cooked at one of the four temperatures.

The resulting tensile strength of the paper is measured.

Here are the data:

Pulp Method 1 2 3

Batch b5 b1 b8

1 2 3 b7 b6 b3

1 2 3 b9 b4 b2

---------------------------------------------

Temperature(F)

200

225

250

275

30 34 29

35 41 31

32 38 33

36 42 31

28 31 31

32 36 35

35 42 32

41 40 35

31 35 32

37 40 39

36 39 39

40 44 40

The raw data is available in a JMP datafile called paper.jmp

in the Sample Program Library ate http://www.stat.sfu.ca/~cschwarz/Stat-650/Notes/MyPrograms . A portion of the data table is reproduced below:

5

In more advanced classes for graduate level statistics, the best linear unbiased predictors (BLUPs) are found and normal probability plots on the BLUPs are used as diagnostic tools.

c 2015 Carl James Schwarz

636

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

The data table has columns corresponding to the cooking temperature, the pulp preparation method, the day the pulp was produced, and the resulting tensile strength.

Note that the preparation method, the batch produced, and the cooking temperature should all be nominal scale, while the response variables should be continuous scale. The setting of nominal scale for temperature is VERY important in this step so that a regression analysis of the effect of temperature is not attempted.

Also note, that it is good practice to number each batch (the main plot experimental unit) UNIQUELY.

In this case, the batches are numbered from “b1” to “b9”. Many books would show batches numbered from 1 to 3 within each pulping method. In this case, batch 1 under pulp method 1 would be a different batch than batch 1 under pulp method 2, i.e. batch is nested under pulp methods. I find this older convention very confusing and try to avoid it where ever possible.

Similarly, the sub-batches have been labeled with unique entries from “s01” to “s36”. Again, many older books would number the sub-batches using the values 1 to 4 used 9 times (one set for each batch).

11.5.1

Why is this a split-plot design?

This is a split-plot design because there are two sizes of experimental units. Pulp preparation works on the larger batches, while cooking temperature uses portions of these larger batches.

This is a common type of split-plot design. Whenever batches of material are produced and then split into smaller units, a split-plot type of design often occurs.

A CRD design would require 36 different batches of pulp produced and the entire batch of pulp would be cooked at one temperature level.

Note that if only a single batch of pulp was produced under each preparation method and split into

12 sub-batches for the cooking factor, this would be an example of pseudo-replication . Why?

c 2015 Carl James Schwarz

637

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

11.5.2

The big three

Treatment structure

This is a two-factor experiment. One factor is pulp preparation method with three levels; the other factor is cooking temperature with four levels. It is a factorial treatment structure as all combinations of pulp preparation method and cooking temperature occur in the experiment.

Because the treatment structure is factorial, the model will have terms corresponding to the main effect of preparation method, method , and cooking temperature, modtermtemp, and their interaction, method*temp .

Experimental Unit structure

There are two sizes of experimental units. The largest is the BATCH of pulp produced under a preparation methods. The smaller is the portion of the batch cooked at the various temperatures.

Notice that the batches are uniquely numbered from b 1 , . . . , b 9 .

THIS IS TO BE ENCOURAGED!

This implies that the model will have a term BATCH-R corresponding to the main plot experimental unit.

Each row in the data table corresponds to the sub-plot experimental unit. Consequently, the sub-plot experimental unit is implicitly included in the model (the residual term), and nothing more needs to be included.

Randomization structure

The main plot factor (preparation methods) was complete randomized to the batch. The sub-plot factor

(cooking temperature) was completely randomized to the sub-batches.

Because complete randomization occurred, nothing needs to be specified in the model.

11.5.3

Preliminary analysis and checking assumptions

Before doing any analysis, create a column for each treatment group (i.e. the combination of pulp method and temperature). Do some side-by-side dot plots to see if there any outliers. Compute the treatment group standard deviations to see if there is evidence of unequal standard deviations among the treatment groups. Plot the standard deviation of each treatment group against the mean to see if there is evidence that the variability increases with the mean.

a completely randomized design ( SE = s/ rather than under a CRD.

n ) because the data was collected under as split-plot design

The profile (interaction) plot for this experiment is found by computing the raw means for each treatment (combination of pulping method and temperature): c 2015 Carl James Schwarz

638

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

The profile plots do not seem to indicate the presence of any large interaction (the lines are approximately parallel). There appears to a increasing temperature main effect. Method M2 appears to give the highest mean strength. (How can you tell this from this plot?)

11.5.4

Statistical model

The model for the split-plot design is built by including terms corresponding to the treatment structure

(main effects of pulp preparation method, cooking temperature, and their interaction) and terms corresponding to the experimental units (batches and sub-batches). As in previous models, the smallest experimental unit is not usually explicitly entered into the model. Because complete randomization was done whenever possible, there are no terms in the model for randomization effects.

All terms corresponding to experimental units are random effects. A decision needs to be make if the two experimental factors (preparation method and cooking time) are fixed or random effects. It seems reasonable that these should be considered fixed effects in this experiment. (Why?).

The statistical model is:

Strength = Method Temp Method*Temp Batch(R)

The terms Method , Temp , and Method*Temp represent the treatment structure – a two-factor, complete factorial experiment.

c 2015 Carl James Schwarz

639

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

The term Batch(R) is an experimental unit effect and must be specified as a random effect. Because the observational unit (the sub-batch) is the same as the sub-plot experimental unit, it is not necessary to specify it in the model using SubBatch(R) ).

If the batches had been numbered using the traditional method of batch 1 to 3 repeated for each method, the model would have to be specified as:

Strength = Method Temp Method*Temp Batch(Method)(R)

This model will also work with the uniquely labeled batches. Here the term Batch(Method) is read as

“Batch nested in Method”. This notation indicates to the computer package that Batch 1 in Method

M1 is a different batch than Batch 1 in Method M2 and a different batch from Batch 1 of Method M3.

Computer packages will interpret the term Batch(Method) differently from Method(Batch) . Only the former is sensible – a general rule of thumb is to specify nesting terms as experiment unit(factor) . This is yet another reason to use unique alphanumeric labels for every experimental unit – there is no need to worry how to specify nested variables.

Split-plot designs must be fit using the Analyze> Fit Model platform of JMP . The dialogue box must be completed specifying the terms in the model. The terms can be in any order.

Here are the ways the model can be specified: c 2015 Carl James Schwarz

640

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

The random effect is added to a term in the model by selecting the term, and then clicking on the

Attributes box:

Specifying the nested effect (the terms with the brackets) can be a tricky. First add the outer term ( Batch ) to the model. Then select the inner term from the list of columns, and click on the Nest button. Don’t c 2015 Carl James Schwarz

641

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS forget to make this a random effect as well.

11.5.5

Results

The effect tests for the main effects and interactions are:

Only effect tests corresponding to the treatment structure are of interest. Some packages will also present

“hypothesis tests” for the experimental units but these are not of interest. As before, hypothesis testing starts with the most complex term (interactions) and work down to main effects.

We start by testing for no interaction between the effects of pulp method and cooking temperature:

H: no interaction in the effect of pulp method and cooking temperature upon the mean response (in the population)

A: some interaction.

c 2015 Carl James Schwarz

642

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

The F -statistic is 1.06 with a p -value of 0 .

4236 . There is no evidence of an interaction between the two-factors on the mean strength.

Because no interaction appears to be present, the main effects can now be examined. The main effects can be tested in any order.

We begin with the test for the main effects of preparation method:

H: no main effect of preparation method on the mean strength of the paper (in the population)

A: some main effect of preparation method on the mean strength of the paper.

The F -statistic is 4 .

00 with a p -value of 0 .

0788 . There is weak evidence of an effect.

Next, the test for the main effect of cooking temperature:

H: no main effect of cooking temperature on the mean strength of the paper (in the population)

A: some main effect of cooking temperature on the mean strength of the paper

The F -statistic is 22 .

7 with a p -value < .

0001 . There is strong evidence of an effect.

As in all ANOVAs, evidence against the null hypothesis of no effect only tells us that an effect may be present, but not where the effect lies. A multiple comparison procedure for each effect should be used.

Notice that the hypothesis tests about the single factors are in terms of main effects. This examines the effect of the factor being tested when averaged over the levels of the other factors. For example, the main effect of preparation method is averaged over the four temperature levels. The main effect tests are also averaged over any interaction terms – refer back to the chapter which introduced two-factor designs for more details.

The estimates of the marginal means and comparison among the pulp preparation methods is shown below:

There is evidence that the mean strength at 200

◦ leads to higher mean strength paper compared to c 2015 Carl James Schwarz

643

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS the other levels, but there is not enough evidence to distinguish the mean strength at other temperature levels.

A similar analysis can be done for comparisons of the mean strength among Methods . The effect test for Method failed to find a statistically significant difference in the mean strengths so the Tukey comparisons also cannot distinguish the means among the Methods . All of the confidence intervals for the difference in mean strength among pairs of Methods include 0. The joined-line plot shows no evidence of a difference in the mean among the levels of the preparation Methods factor.

And if the interaction test has indicated that there was evidence of an interaction, this above methods can be extended for all the possible treatments – see the code for details

Note that the precision of the estimates for the effects of cooking temperature is better than that for the effect of preparation method. This is a result of the two different sized experimental units found in this experiment. More generally, if estimates of the combined effect of both preparation method and cooking temperature were of interest, then different contrasts (depending if they were within the same preparation method or crossing preparation methods) would have different precisions.

It is also possible to get estimates of the variation of the two different sized experimental units:

This indicates that the batch-to-batch variance ( 3 .

75 = 1 .

9365 2 ) is about equal in size to the sub-batchto-sub-batch variation ( 3 .

97 = 1 .

930 2 ). This is useful information for planning future experiments but is not covered in this course. In many cases, the main plot variation is much larger than the sub-plot variation.

We can look at residual plots and other diagnostic plots in the usual way. These are not shown for this example.

11.5.6

CAUTIONS

Some packages will give WRONG estimates of the se for the marginal means and for the contrasts even if the correct model is specified (e.g.

SAS Proc GLM ). In general, be wary when using computer packages when fitting models with random effects.

11.6

Example - Biomass of trees - main plots in an RCB

A forestry trial was conducted to investigate the relative growth rates of tree seedlings of different varieties when the seedling was treated after planting.

The experiment was conducted by selecting seeds from 4 varieties of trees (denoted Victoria1, Victoria2, Clinton, and Branch). Around the province, 4 forestry blocks were selected for trials. Within each c 2015 Carl James Schwarz

644

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS block, 1 ha plots were planted with seedlings from each variety. Then within each 1 ha plot, four 100 m

2 sub-plots were selected, and one of four different sprays (consisting of combinations of nutrients, fungus suppressants, and the like) were sprayed on the seedlings. These four treatments were denoted as (Check, Ceresan, Panogen, Agrox).

DRAW AN EXPERIMENTAL PLAN so you understand the layout!

The total biomass (kg) was measured after 5 years.

Here is the raw data: victoria1 victoria2 clinton branch 1

2

3

4

3

4

4

1

2 block check ceresan panogen agrox

1 429 538 495 444

1

2

3

2

3

4

416

289

308

533

696

454

585

439

463

576

696

424

538

407

394

598

658

414

418

283

347

641

574

441

351

623

585

446

503

754

656

540

527

519

634

504

450

467

703

673

576

585

454

645

461

626

503

688

653

456

510

516

636

561

527

518

716

694

566

474

11.6.1

Data entry

The data are entered into a JMP data table called forest.jmp

available in the Sample Program Library at http://www.stat.sfu.ca/~cschwarz/Stat-650/Notes/MyPrograms . The JMP data table has four columns corresponding to variety (nominal scale), block (nominal scale), spray (nominal scale), biomass (continuous scale), and mpeu (main plot experimental unit) (nominal scale).

Notice how the Trt variable was constructed for the treatments (combinations of factor levels) in this experiment. The ordering of the rows in the dataset is NOT important, i.e. all the data from a treatment combination does not have to be contiguous.

We again ensure that unique labels are used for each block and main plot experimental unit. The actual labels are not that important, but need to correspond to the experimental units in the field. We again ensure that

All factors must be nominal scale and declared as factors some statistical packages (e.g.

R ).

We define the plot as the combination of block and variety as each variety in each block occurs on a unique plot. It would be preferable to actually number the plots in the data table rather than relying on c 2015 Carl James Schwarz

645

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS this assumed relationship.

11.6.2

Preliminary analysis

Before doing any analysis, do side-by-side dot plots to check for outliers. Compute the means and standard deviations for each treatment group. Check that the standard deviations are roughly equal for all treatment groups. Plot the standard deviation vs. the mean for the treatment groups to see if there appears to be a relationship between the two.

Make a Profile plot to see if there is evidence of an interaction:

The profile plot shows some evidence of non-parallelism, particularly in the response of the victoria1 variety across the 4 spray levels. However, there is little evidence of a variety or spray effect (why?).

11.6.3

The statistical model

This is a split-plot design with main plots grouped into blocks. This experiment has two sizes of experiment units. The main plots are grouped into blocks and each main plot is planted with a single variety.

Then each main plot is subdivided into smaller plots and each smaller sub-plot is treated with a different spray.

In a pure RCB design, there would be a total of 64 smaller plots, potentially grouped into 4 blocks.

Within each block of 16 plots, the 16 treatments of variety and spray would be randomized to the smaller plots.

If the experiment had been conducted with only 1 large plot for each variety, it would an example of pseudo-replication. Why?

Treatment structure

There are two factors in this experiment. One factor is variety with four levels. The levels are fixed effects. The other factor is spray with 4 levels. The levels are fixed effects. There are a total of 16 treatment combinations. All treatments appear in the experiment. The treatment structure is factorial all main effects and interactions should appear in the model. The model will contain the terms variety , c 2015 Carl James Schwarz

646

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS spray , variety*spray .

Experimental Unit structure

There are several sizes of experimental units. The largest is the forestry block. It is not clear how these were selected - were these selected from at random around the province, or are these four particular

blocks of interest in their own right? We will assume that blocks are random effects

6

.

The next size of experimental unit is the 1 ha plot within each block. The varieties are assigned to these 1 ha plots. The main plots are arranged in a RCB design within the larger blocks. Notice in the

JMP data table, these are labeled as the MPEU (main plot experimental unit) and are uniquely numbered from 1 , . . . , 16 .

The main plot experimental unit can be specified in the model using the term mpeu(R) . Or, using the old conventions, every combination of block and treatment defines the main plot experimental unit (why?). Consequently, the main plot experimental unit could also be specified using the term block*variety(R) . This is NOT an interaction between blocks and varieties – by definition, blocks are not supposed to interact with treatments. Rather it is simply a “convention” to identify the main-plot experimental units.

The smallest size of experimental unit is the 100 m 2 sub-plots.

sub-plots. The sprays were assigned to these

Hence the experimental unit structure is a split-plot design with main plots being the 1 ha plots arranged in blocks and the sub-plots being the 100 m 2 sub-plots.

Randomization structure

The varieties were assigned to the main-plots in a randomized complete block fashion and the sprays assigned to the sub-plots in a randomized fashion within each main-plot (looks like a randomized block design within each main-plot).

11.6.4

Statistical model

The model must account for both the factorial treatment structure and the split-plot experimental unit structure.

The model using the traditional syntax is:

Biomass = Variety Spray Variety*Spray Block(R) Block*Variety(R)

If the main plot experimental units are uniquely numbers, the model could also be specified as:

Biomass = Variety Spray Variety*Spray Block(R) mpeu(R)

6

Because the blocks are complete, i.e. have all treatment combinations, treating blocks as fixed or random will have no influence on the hypothesis tests. The only effect of treating blocks as fixed or random is on the standard errors of the marginal means which are of limited use. The se s of contrasts are unaffected c 2015 Carl James Schwarz

647

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

The terms corresponding to Variety , Spray , Variety*Spray correspond to the treatment structure of the design – a two-factor complete factorial treatment structure.

The term Block(R) corresponds to the blocking done at the main plot level. One could argue that this term should either be a fixed or a random effect depending if these are the only blocks available for future experiments. Because the blocks are complete, i.e. have all treatment combinations, treating blocks as fixed or random will have no influence on the hypothesis tests. The only effect of treating blocks as fixed or random is on the standard errors of the marginal means which are of limited use. The se s of contrasts are unaffected.

Because the main plot experimental units were numbered uniquely, a good statistical program would allow you to simply specify mpeu-R to represent the main-plot experimental units. It could also be specified using the term Block*Variety in the “traditional model” representing the experimental units for the main plot factor. It is NOT a block-variety interaction, because, by assumption, blocks do NOT interact with factors. The reason that the experimental unit for the Variety factor looks like an interaction term and not like a nested term as seen in the previous example is again purely historical. The experimental unit for variety is the plot. The specific plot that a variety was planted in can be determined by the combination of block and variety levels. Hence the “plot number” is synonymous with the block-variety level combination.

The Analyze> Fit Model platform is used to fit the model using either of the formulations: c 2015 Carl James Schwarz

648

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

The Variety , Spray , and Variety*Spray terms represent the treatment structure. The Block-R term represents the blocking structure at the main plot level. The Block*variety-R or MPEU-R represents the main plot experimental units. Some packages specify this using a Block*variety term, but this is NOT an interaction between block and variety – rather it represents the experimental units to which variety were assigned. I personally would find it easier to label the individual plots, but the individual plots can be identified uniquely by the combination of block and variety in the experimental plan.

Again, only tests for effects corresponding to the treatment structure are of interest, i.e. don’t examine the test for block effects or the tests for main plot experimental units effects. The tests for the model effects are summarized below:

Testing begins with the most complex term and works backwards.

In this case, begin with the test for interaction:

H: There is no interaction between the effects of variety and spray upon the mean biomass.

A: There is an interaction between the effects of variety and spray upon the mean biomass.

c 2015 Carl James Schwarz

649

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

The test statistic is F = 3 .

2082 with a p -value of 0.0059.

There is very strong evidence of an interaction between the effects of variety and spray upon the mean biomass. Once again, when you detect an interaction, it really doesn’t make much sense to proceed to test the main effects. Looking back at the profile plot, it appears that most of the interaction may be caused by the Victoria1 variety.

In this case, the LSMeans for the combination of variety and spray can be found and a multiplecomparison of the treatments performed (not shown).

11.6.5

A simpler model to a subset of the data

We decide to proceed by dropping the victoria1 variety and analyzing the remainder of the data.

The summary of the test-statistics for this sub-set of the data is shown below:

For this reduced set of data, we find that there is no evidence of an interaction between the effects of variety and spray upon the mean response, and also no evidence of a difference in the mean response among the 4 sprays or among the 3 remaining varieties. This is consistent with the results of the profile plot.

At this point, you could look at the Tukey multiple comparisons to estimate the effect sizes and obtain the se of the effect sizes to see if the non-detection is a result of small differences or large variance (i.e.

is the confidence interval for the differences small enough to be useful).

Consult the various code snippets for more details.

11.7

Example - Tenderness of meat - main plots in an RCB

This example is based on an experiment conducted by Ivy Larsen of Agriculture Canada.

Does the method of preparation or the presentation affect the perceived tenderness of cooked meat?

This experiment was conducted by first removing loin muscles from 6 different carcasses. Each muscle was cut into two roasts. One roast was a quick roast (longer in length and shorter in width) and one roast was a square roast (same width and length). Both roasts were cooked to the same temperature on the same day in ovens that are the same make and model. Each roast had 6 slices and 6 cubes (1 cm

× 1 cm × 1 cm) removed from it. One cube and one slice from each roast was served to 6 panelists and they were rated for initial tenderness (IT). The same six judges ranked all the roast types and serving c 2015 Carl James Schwarz

650

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS method.

While the tenderness scores were obtained from each judge, this example will initially consider the average score over all 6 judges for each sample.

Here are the average scores:

6

1

2

3

4

5

5

6

3

4

6

1

2

3

4

5

1

2

5

6

2

3

4

Carcass Cut

1 Quick

Serving

Cube

Tenderness

3.6

Quick

Quick

Quick

Cube

Cube

Cube

Quick Cube

Quick Cube

Square Cube

Square Cube

3.8

4.2

3.7

5.4

4.1

5.5

4.1

Square

Square

Square

Square

Square

Square

Square

Cube

Cube

Cube

Square Cube

Quick Slice

Quick Slice

Quick Slice

Quick Slice

Quick Slice

Quick Slice

Square Slice

Square Slice

Slice

Slice

Slice

Slice

5.3

5.7

5.0

5.5

5.6

5.5

5.0

6.2

5.8

5.4

4.3

4.9

4.0

5.4

4.1

5.6

11.7.1

Data Entry

The data has 4 columns. One column identifies the carcass from which the muscle was cut (nominal scale); one column for the cut of the roast (nominal scale); one column for the serving method (nominal scale); and one column for the average tenderness over the six judges (interval or ratio scale). An additional column was added ( Roast ) with values from 1 to 12 to uniquely label the 12 roasts (two from each carcass) that were cooked in the experiment.

The raw data is available in a JMP datafile called tenderness.jmp

available in the Sample Program Library at http://www.stat.sfu.ca/~cschwarz/Stat-650/Notes/MyPrograms . Notice how the treatment variable is constructed from the factor levels.

c 2015 Carl James Schwarz

651

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

11.7.2

Preliminary Data Analysis

The standard side-by-side dots plots are created for each treatment combination to check for outliers (a different marker is used for every carcass).

There may be one or two points which appear to be anomalous in the Quick.Cube

and the Quick.Slice

treatments, but these are from different carcasses.

The assumption of additivity of blocks and treatments can be assess by joining points from each carcass (as shown below using the package JMP .

c 2015 Carl James Schwarz

652

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

It appears that the response across carcasses is roughly parallel so that the assumption of additivity between blocks and treatments appears to be reasonable.

The dot-plot shows that the standard deviations are roughly equal in all treatment groups. You could also plot the standard deviation of each treatment group against its corresponding mean to see if the standard deviation is roughly constant.

A plot of the means in a profile plot: indicates that there likely is an effect of serving style, possibly of cut, but the parallelism indicates little interaction between the two factors.

c 2015 Carl James Schwarz

653

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

11.7.3

Design considerations

This is a split-plot design because the muscle from each animal is first divided into two and cut in different ways. Then from each cut of meat (after cooking) both serving methods are used.

If this was a pure RCB, then each muscle would need to be cut into 4 pieces: 2 Quick and 2 Square cuts. All 4 pieces would be cooked. Then one of the Quick cuts would be sliced, and one of the Quick cuts would be cubed; similarly one of the two Square cuts would be sliced and the other cubed.

Treatment structure

There are two factors in this experiment. One is the method of Cut of the meat with two levels: Quick and Square . The second is the method of Serving with two levels: Slice or Cube .

If the raw data on each judge were available, then the judge could also be treated as a factor (see later in this section).

There are 4 treatment combinations. All treatment combinations appear in the experiment so the treatment structure is factorial.

Experimental Unit structure

There are two sizes of experimental units. First, are the roasts which are either cut in the traditional square of quick cook format. The roasts was arranged in blocks (the carcasses) with two taken from each carcass..

The second are the parts of roast after cooking which are either sliced or cubed.

The larger size of experimental unit was arranged in blocks (the carcasses).

Randomization structure

Complete randomization is assumed. The description of the experiment was not explicit about how the carcasses were chosen – presumably six animals were chosen at random from some herd for the experiment. After cooking, the roasts were divided in half (one half for cubing and the other half for slicing) and the assignment of serving method to the half of roast should be at random.

11.7.4

Statistical Model

The model must account for both the factorial treatment structure and the split-plot experimental unit structure.

The model is:

Tenderness = Cut Serving Serving*Cut Carcass(R) Carcass*Cut(R) c 2015 Carl James Schwarz

654

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

The terms corresponding to Cut , Serving , Cut*Serving correspond to the treatment structure of the design – a two-factor complete factorial treatment structure.

The term Carcass(R) corresponds to the blocking done at the main plot level. This is a random effect because if the experiment were repeated, new animals would be chosen and the intent is to generalize this experiment to all carcasses and not just the six in this experiment. The blocking variable could also be declared as a fixed effect with no influence on the results of the hypothesis tests because each block is complete (i.e. all treatments occur in every block). The only impact of treating blocks as fixed or random effects are in the standard error of the marginal means which are of limited usefulness in any case.

Finally, the term Carcass*Cut represents the experimental units for the subplot factor, i.e. the individual roasts. It is NOT a carcass-cut interaction, because, by assumption, blocks do NOT interact with factors. The reason that the experimental unit for the Cut factor looks like an interaction term and not like a nested term as seen in the previous example where the subplot factor is run as a CRD against the main-plot experimental units, is again purely historical. The experimental unit for Cut is the roast. The specific roast prepared by the cutting method can be determined by the combination of carcass and cut levels. Hence the “sub-plot number” is synonymous with the carcass-cut level combination.

The model could also be written as:

Tenderness = Cut Serving Serving*Cut Carcass(R) Roast(R) if the 12 roasts have unique labels. The results are identical under both model formulations.

The Analyze> Fit Model platform is used to fit the model:

There is a direct correspondence between the specification of the model in the Analyze> Fit Model platform and the terms in the simplified model syntax seen earlier. Remember that the Cut*Carcass(R) term is NOT an interaction – it represents the main-plot factor experimental unit.

Only the tests for the fixed-effect factors (and their interaction) are of interest. It usually is not of c 2015 Carl James Schwarz

655

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS interest to perform tests on the random effects.

Note that in this example, the variance component corresponding to Carcass is estimated to be zero.

Consequently, depending on your computer package, you may get different results. I will give the summary results for JMP , SAS , and R below.

These are results from JMP when the variance components are left unbounded (i.e. can go below zero - see below).

Similarly, for SAS :

Effect cut serving cut*serving

Type 3 Tests of Fixed Effects

Num DF Den DF F Value Pr > F

1

1

1

10

10

10

3.77

0.0809

17.02

0.0021

0.15

0.7062

Finally, from R which does not allow variance components to go negative.

Test statistics all use Carcass variance component of 0 so test for Cut differs from SAS/JMP

Test for fixed effects using the Satterthwaite approximate

[1] "Asymptotic covariance matrix A is not positive!"

Analysis of Variance Table with Satterthwaite approximation for degrees of freedom

Cut

Df Sum Sq Mean Sq F value Denom

1 1.2633

1.2633

Pr(>F)

3.9004 19.997 0.0622447 .

Serving 1 5.7038

5.7038 16.4334 19.997 0.0006205 ***

Cut:Serving 1 0.0504

0.0504

0.1453 19.997 0.7071279

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Test for fixed effects using the Kenward-Roger approximate

Notice that the F-statistic reported here are WRONG (groan), but the p-values are ok

[1] "Asymptotic covariance matrix A is not positive!"

Analysis of Variance Table with Kenward-Roger approximation for degrees of freedom

Cut

Serving

Df Sum Sq Mean Sq F value Denom

1 1.2633

1.2633

3.9004

1 5.7038

5.7038 16.4334

Cut:Serving 1 0.0504

0.0504

0.1453

Pr(>F)

5 0.105258

10 0.002309 **

10 0.711081

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 c 2015 Carl James Schwarz

656

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

The default output from lmer doesn’t have p-values

Analysis of Variance Table

Cut

Serving

Df Sum Sq Mean Sq F value

1 1.2633

1.2633

3.7700

1 5.7038

5.7038 17.0219

Cut:Serving 1 0.0504

0.0504

0.1505

Because of the variance component for carcasses that is estimated to be zero, the F -statistic that is reports for the KR-approximation in R for the df is incorrect, but the p

-values are “correct” 7

There is no evidence of an interaction between the effect of Cut and the Serving style on the MEAN tenderness of the meat

( p = 0 .

71 ). Because there is no evidence of an interaction between the effects of the two factors, it makes sense to examine main effects in detail.

There is strong evidence of a main effect of the Serving style on the MEAN tenderness of the meat

( p = 0 .

0021 from JMP and SAS and different p -values from R depending on which approximation to the df are used).

The big differences are in the tests for the main effect of Cut upon the MEAN tenderness of the meat with different packages and different approximations giving rise to different p -values. Fortunately, despite these differences, the overall conclusions are the same.

There is no universal recommendation on what to do when variance component are negative. Allowing for negative variance components, e.g. in JMP and SAS , is not as crazy as it sounds, because effect estimates of the fixed effects are still unbiased and the correct degrees of freedom (accounting for the different sizes of experiment units) are automatically correct. Some packages, allow you to then refit the models forcing the variance components to be non-negative (e.g.

JMP and SAS , but R always forces variance components to be non-negative). This can has implications in how the df for the effect tests are computed and you will get different F -statistics and p -values compared to allowing the variance components to go negative.

Examining the p -values for fixed effects is seldom sufficient, and estimates of effect sizes should always be obtained.

For example, the estimated marginal MEAN tenderness under the two serving styles is found in the output and an estimate of the difference in MEAN tenderness between the serving styles can be also be found:

7

R is free, but not cheap.

c 2015 Carl James Schwarz

657

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

In this example, the estimated variance component associated with carcasses is negative as seen in the variance components from JMP or SAS , but not from R .

c 2015 Carl James Schwarz

Cov Parm carcass

Covariance Parameter Estimates

Ratio Estimate Alpha Lower

0 0 .

.

Upper

.

658

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

Cov Parm

Covariance Parameter Estimates

Ratio cut*carcass 0.03578

Estimate

0.01199

Alpha Lower Upper

0.05

0.001965

6.55E130

Residual 1.0000

0.3351

0.05

0.1636

1.0320

2

3

Groups Name Variance Std.Dev.

1 Carcass:Cut (Intercept) 1.2000e-02 1.0954e-01

Carcass (Intercept) 5.6764e-20 2.3825e-10

Residual 3.3508e-01 5.7886e-01

This may look a bit alarming because variance components, by definition, must be positive! Both JMP

(when the Unbounded Variance Components box in the Fit Model dialogue box is checked) and SAS

(when the unbounded option is specified on the Proc Mixed statement) can allow the variance components to go negative. The output is not shown here, but is available on full output file.

This is the recommended option to ensure that hypothesis tests remain unbiased. The variance components should be allowed to be negative when conducting the hypothesis tests because the variance components are used to build the covariance structure over ALL observations, and it is important the the latter be unbiased. Please contact me for more information about this topic.

11.7.5

Suppose that individual measurements were available from each judge?

By analyzing the averages over the judges, it is not possible to examine the consistency of the judges

(main effects of judges) or if some judges prefer different cuts or serving styles (interactions with the cut or serving factors).

If the individual judge scores were available, a more complex model is required. However, before this more complex model is fit, careful consideration must be made of how the judges are assigned pieces of meat. Presumably, it is difficult to random slice or cube from the individual roast – logistically, the roasts are likely subdivided, and one half is sliced and one half is cubed. The slices or cubes are randomly assigned to judges. The judges randomly taste the twelve pieces of meat and score for tenderness.

This makes the design a split-split-plot design, i.e. another level of splitting would occur under the last level. The analysis of such a design is beyond the scope of this course, but is a simple extension of the methods presented in this chapter.

11.8

Example - Fungi degrading organic solvents - a split-plot in time

A very common application for a split-plot type of analysis is when repeated measurements are taken over time.

CAUTION : Because randomization of the time levels is not possible, this type of experiment or survey is more properly analyzed as a repeated-measure design. However, in many environmental studies, the split-plot analysis will be a good first-order approximation to the proper analysis and is certainly far c 2015 Carl James Schwarz

659

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS better than treating the data as if it came from a CRD!

8

This experiment is based on a real study conducted here at SFU to investigate the relative efficiency of different fungal treatments to degrade organic solvent spills.

A former gas station was being torn down and the owners wished to build an apartment complex on the property. Unfortunately, the soil was contaminated and before a new complex could be built, the soil had to be ‘cleaned up’. Over time, the organic solvents will naturally degrade, but certain fungi can speed up the process.

In this experiment, 12 sites (plots) on the property were randomly selected. Three sites were randomly assigned to the four fungal treatments (Control, and three different combinations of fungi in different concentrations). The amount of organic solvent was measured at 1, 2, 3, and 4 months post treatment.

Here are the raw data:

Treatment Site 1 month 2 months 3 months 4 months fungal1 fungal1

1

2

616.7

790.2

376.1

513.7

328.9

438.4

178.1

348.1

fungal1 control control control fungal2 fungal2 fungal2 fungal3 fungal3 fungal3

3

4

5

6

7

8

9

10

11

12

541.2

944.7

826.3

840.6

588.9

498.0

586.7

712.0

684.4

568.3

452.0

655.4

581.4

604.3

491.8

407.3

388.4

598.9

542.5

547.8

541.4

587.8

590.3

595.9

455.0

500.0

492.4

442.1

463.1

702.9

186.0

383.2

445.3

458.8

513.7

568.1

623.3

222.2

320.3

438.2

The data table will stack the four repeated measurements over time into a single column. The variables in the table will be the treatment, the site on the property, the time point (1, 2, 3, or 4 months) and the response (the amount of organic solvent). Be sure that all factors and experimental units are nominal scaled. Note that every site is uniquely labeled.

The raw data is available in a JMP datafile called fungus.jmp

available in the Sample Program Library at http://www.stat.sfu.ca/~cschwarz/Stat-650/Notes/MyPrograms .

Here is a portion of the raw data entered into JMP :

8

In more advanced graduate classes in statistics, a common analysis would be to fit an AR(1) structure to the time effects to see if there is evidence of a declining correlation as observations are further apart from each other.

c 2015 Carl James Schwarz

660

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

Notice how the four measurements over time on the same site are stacked into a single column with a corresponding variable giving the time.

11.8.1

Preliminary analysis

Create side-by-side dot plots for each treatment group (why?). Compute the mean and standard deviations for each treatment group (why?). Plot the standard deviation vs. the mean across all treatment groups (why?)

The Profile plot is:

It is possible to plot the data with treatments along the horizontal axis with a separate line for each month, but most people prefer to see time along the horizontal axis axis.

The profile plot appears to show that the fungal treatments are better at removing contaminants than the control but that all fungal treatments seem to work similarly. There is some evidence of interaction

(between months 3 and 4), but I suspect that we won’t detect this. It is a pity that we don’t have c 2015 Carl James Schwarz

661

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS confidence intervals for the endpoints (hard to compute because the design is not a CRD) to judge the degree of non-parallelism.

11.8.2

Model development

The key concept here is that the same site is repeatedly measured over time and so we treat the repeated measurements as “sub-plots” in each site.

The experimental unit for the fungus treatment is the site. The experimental unit for the “time” factor is the month at the particular site.

Again, note that because Time cannot be randomized at the sub-plot level, this can cause problems as noted earlier.

Treatment structure

There are two factors in this experiment. One factor is fungal treatment with four levels (the control plus the other 3 treatments). The levels are fixed effects. The other factor is time with 4 levels. The levels are fixed effects. There are a total of 16 treatment combinations. All treatments appear in the experiment. Hence, it is a factorial experiment.

Experimental Unit structure

There are two sizes of experimental units. The treatments are applied to the 12 sites chosen on the property completely at random. Because site is an experimental unit, it should be considered as a random effect.

The sub-plot do not physically exist, but should be thought of as the individual months within that site. There are 4 time points and hence four smaller experimental units.

Randomization structure

The treatments were assigned to sites completely at random. Hence, at this level, the design is a CRD.

Time is always an interesting factor as it cannot be randomized, e.g., how can you take a measurement at 3 months before the measurement at 1 month? For this reason, split-plots in time (as they are called), are only a first order approximation to the proper analysis (called a repeated measures analysis). In most cases, the results will be very similar.

The model

The model must account for both the factorial treatment structure and the split-plot experimental unit structure.

c 2015 Carl James Schwarz

662

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

The model is:

Solvent = Fungus Time Fungus*Time Site(Fungus)(R) or, if the Sites have unique labels, the notationally simpler (but equivalent) model

Solvent = Fungus Time Fungusl*Time Site(R)

The terms Fungus , Time , and Fungus*Time represent the treatment structure – a two-factor complete factorial treatment structure.

The term Site(Fungusl)(R) or Site(R) represents the experimental units for the fungal treatment.

Again, the nesting syntax (in the first model statement) is a hold over from the historical way in which these are specified – it is easier to think of this term as representing the experimental unit for fungal treatment. If sites are uniquely labeled (as in this experiment), a good package will allow you to specify this term as Site(R)

Because time CANNOT be randomized, this model SHOULD include a term representing the ran-

domization restriction. This is beyond the scope of this course.

9

Consequently, analysis of split-plots in time using standard split-plot models should be reviewed carefully by a knowledgeable expert.

11.8.3

Results

The Analyze> Fit Model platform in JMP is used:

9

For example, an AR(1) structure could be applied to the covariances across the 4 measurements on a site c 2015 Carl James Schwarz

663

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

The Time , Fungus , and Time:Fungus terms model the factorial structure in the treatments. The Site term represents the experimental units for the fungal treatment. The second size of experimental unit is assumed to exist in the ‘residual’ or ‘error’ part of the model.

There was no blocking of the sites, so no blocking term is specified.

The effect tests are summarized below.

Only terms corresponding to treatment structures are of interest. As before, start with the most complex term and work backwards.

The hypothesis test for no interaction is:

H: There is no interaction between the effects of the fungal treatments and month on the mean contaminant level.

A: There is an interaction between the effects of the fungal treatments and month on the mean contaminant level.

Notice that this hypothesis states that the changes over time are parallel.

The test statistic is F = 0 .

9460 with a p -value of 0.5055.

There is no evidence of an interaction between the effects of the fungal treatment and the month in the mean contaminant level.

As no interaction was detected, it is sensible to test for main effects. The order in which the main effects are tested does not matter.

The test for no main effect of time is:

H: The mean contaminant level averaged over fungal treatments is equal across all months

A: The mean contaminant level averaged over fungal treatments is not equal across all months.

In most cases, this is not a very interesting hypothesis because we hope that there will be a decline over time! We find that the F -statistic is 23.22 with a p- value < 0 .

0001 and we have strong evidence against hypothesis of no decline. This results is not surprising, as the material does degrade over time.

We won’t pursue this hypothesis any further.

The test for no main effect of fungal treatments is more interesting:

H: The mean contaminant level averaged over months is the same for all fungal treatments

A: The mean contaminant level averaged over months differs among the fungal treatments.

The F -statistic is 12.61 with a p -value of 0.0021. There is very strong evidence against the hypothesis of no effect.

Of course, at this point, we don’t know which treatments differ from which other treatments, so we c 2015 Carl James Schwarz

664

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS need to perform a multiple comparison procedure on the fungal treatment.

It appears that the control mean is much higher than the other 3 means, but there is no evidence that mean of the fungal treatments are dissimilar.

A similar analysis can be made for the effect of time, but these are not interesting as we would be very surprised if there was no evidence of a time effect just based on basic biology. Because there was no evidence of an interaction effect, this was not pursued further. You will find that the different comparisons (either within a main-plot factor level or across main-plot factor levels) will have different precisions as noted earlier in this chapter.

11.9

Example - Home range - an unbalanced split-site plot in time

This is another example of a split-plot-in-time. In this example, the experimental design is unbalanced at the main-plot level. This does not introduce any major complications. This example also demonstrates that there are two different way to estimate marginal means – the raw means and what are called Least

Square Means - and how and why they differ.

CAUTION : Because randomization of the time levels is not possible, this type of experiment or survey is more properly analyzed as a repeated-measure design. However, in many environmental studies, the split-plot analysis will be a good first-order approximation to the proper analysis and is certainly far better than treating the data as if it came from a CRD! The problem associated with split-plots-in-time were discussed earlier.

The home-range is the area over which the animals wander during a course of a season. This is of interest because it delineates the size of territories, the area needed to support a single animal, etc. In this experiment, male and female foxes were radio-collared and followed over 4 seasons. Based on the c 2015 Carl James Schwarz

665

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS radio-telemetry, the area of the home-range ( km 2 ) was computed.

Do these two sexes have similar home ranges? Are there seasonal effects?

The data are entered in the usual fashion: a column for sex, season, area, and animal number. Label the animals uniquely if possible as this makes model building easier.. Be sure all variables corresponding to factors and experimental units are nominal scaled. Be sure that response variable is interval or ratio

(continuous) scaled.

The raw data is available in a JMP datafile called homerange.jmp

available in the Sample Program Library at http://www.stat.sfu.ca/~cschwarz/Stat-650/Notes/MyPrograms . A portion of the data is reproduced below:

11.9.1

Preliminary analysis

We start with the usual preliminary analysis of the data. The dot plots (see below) did not show any obvious outliers.

c 2015 Carl James Schwarz

666

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

The standard deviations (computed using Tables> Summary below) are approximately equal for all treatment groups.

The assumption of normality cannot be readily checked, but given that variances are roughly equal in all treatment groups, that there are no outliers, and that the dot plots appear to be symmetrical about the mean, this assumptions seems reasonable.

The Profile plot is: c 2015 Carl James Schwarz

667

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

The profile plot appears to show that there is no interaction between the two factors, there is clear evidence of a sex effect (lines not coincident) and of a season effect (lines not flat). (How can you tell this from this plot?)

While it is possible to draw the profile plot with sex along the horizontal axis, I think most people would prefer to see time along the bottom axis.

11.9.2

Model building

The key here is that the same animal is repeatedly measured over time.

The experimental unit for the sex factor is the individual animal. The experimental unit for the time factor is the season.

Again, note that because Time cannot be randomized at the sub-plot level, this can cause problems if there are a large number of levels of time and if the correlation between observations on the same animal declines as a function of time.

Notice that a CRD for the same experiment would require that each animal be measured ONLY in one season and would require 28 different animals. This is often a good way to identify that a split-plot experiment has been conducted - how many experimental units would be required if treatments were randomly assigned to individual experimental units.

Treatment structure

There are two factors in this experiment. One factor is sex with two levels The levels are fixed effects.

The second factor is season with 4 levels. The levels are fixed effects. There are a total of 8 treatment combinations. All treatments appear in the experiment. Hence, it is a factorial experiment.

Experimental Unit structure

There are two sizes of experimental units - animals and seasons.

Because animal is an experimental unit, it should be considered as a random effect. Because there are two different sizes of experimental units, this must be specified in the model building process. This c 2015 Carl James Schwarz

668

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS will be done with a term that represents animals nested within sex because different animals were used with each sex. If the animals are labeled with unique identifiers, it is not necessary to use the nesting syntax when describing the model.

Randomization structure

Obviously, sex cannot be randomized to animals. Consequently, we assume that the animals selected are a random sample from the corresponding sex. At this level, the design is a CRD.

Time could not be randomized. As noted earlier, this can cause problems and an alternative analysis may be a repeated measures analysis.

The model

The model must account for both the factorial treatment structure and the split-plot experimental unit structure.

The model can be specified as:

Area = Sex Season Sex*Season Animal(Sex)(R) or

Area = Sex Season Sex*Season Animal(R) because the animals have been uniquely labeled.

The terms Sex , Season , and Sex*Season represent the treatment structure – a two-factor complete factorial treatment structure.

The term Animal(Sex)(R) or Animal(R) represents the experimental units for the Sex factor. Again, the nesting syntax is the historical way in which these are specified – it is easier to think of this term as

Animal being the experimental unit for the Sex factor.

As in the previous example, a term should be added to represent the randomization restriction on time but is beyond the scope of this course.

The Analyze> Fit Model platform in JMP is used: c 2015 Carl James Schwarz

669

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

The sex , season , and sex*season terms model the factorial structure in the treatments. The animal(sex)(R) term represents the experimental units for the sex factor.

Note how this specification of the model differs from the case when the main-plot treatment is assigned to main- plots in a RCB design.

11.9.3

Results

The effect tests are summarized below:

Only terms corresponding to the treatment structure are of interest. As before, start with the most complex term and work backwards.

The test for interaction is: c 2015 Carl James Schwarz

670

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

H: There is no interaction between the effects of the sex and season on the mean home range.

A: There is an interaction between the effects of the sex and season on the mean home range.

[Notice that this hypothesis states that the changes in mean home range size over time are parallel.]

The test statistic is F = 1 .

2 with a p -value of 0.3408.

There is no evidence of an interaction between the effects sex and season on the mean home range.

As no interaction was detected, it is sensible to test for main effects. The order in which the main effects are tested does not matter.

The test for the effect of seasons is:

H: the mean home range averaged over sexes is equal across all seasons

A: the mean home range averaged over sexes is not equal across all seasons.

We find that the F -statistic is 7.92 with a p -value of 0.0021 and we have strong evidence against the hypothesis no effect of season.

A Tukey multiple comparison to investigate where differences in the mean home range size across seasons exists provides: c 2015 Carl James Schwarz

671

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

Now compare the raw means computed from the data: with the estimated marginal means:

Notice that the LSMean (the marginal mean) for each season differs from the raw means. This is caused by the unequal number of animals in each sex. The LSMean gives EQUAL weight to all levels of factors being averaged over, in this case, the male and female mean home ranges are given equal weight when finding the marginal mean for each season. This corresponds to a 50:50 sex ratio. The raw means for the home range by season give more weight to the males than the females as there were more males than females measured.

Both estimates are valid depending on how the unbalance in the data should be handled. If the sex ratio is 50:50, then the different number of males and females in the sample is just an artifact (perhaps one sex is easier to capture than the other) and so the two sexes should be given equal weight. However, suppose that males are more numerous, so the greater number of males captured in a sample reflects different population abundances. In this case, the mean home range over both sexes should give more weight to the sex with more animals.

How are the two means computed? The following is an illustrative example using made-up data:

Sex

Male

Female

Home Ranges

4 5 6

7 8 9 10 11 weight to the females as there are more female animals. The LSMeans would be computed as

5+9

2

The raw mean home range would be computed as

4+5+6+7+8+9+10+11

8

= 7 .

5 . This gives more

4+5+6

3

+

7+8+9+10+11

2

= 7 .

0 . This averaging-of-averages would give equal weight to the males and females (e.g., perhaps

5 = c 2015 Carl James Schwarz

672

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS because there is a 50:50 sex ratio in the population).

The test for the main effect of sex is:

H: the mean home range averaged over seasons is the same for both sexes

A: the mean home range averaged over seasons differs between the sexes.

The F -statistic is 11.14 with a p -value of 0.0206. There is very strong evidence against the hypothesis of no effect.

We estimate the marginal means and the difference between the two sexes in the usual way.

The raw and LSMeans will be equal in this case because very animal was measured on all 4 seasons. If some animals were missing some data, then the two means could differ.

Because this is a split-plot-in-time, you could also fit models where the correlational structure is

AR(1) – please contact me for more details.

c 2015 Carl James Schwarz

673

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

11.10

Example - Floral scents and learning - pseudo-replication

Can pleasant aromas help a student learn better? Hirsch and Johnston, of the Smell & Taste Treatment and Research Foundation, believe that the presence of a floral scent can improve a person’s learning ability in certain situations. They also believe that this may differ between the two sexes.

In their experiment, 21 people worked through a set of pencil and paper mazes six times, three times while wearing a floral-scented mask and three times wearing an unscented mask. The order in which the mazes were done was randomized as was the wearing or not wearing a mask.

Participants put on their masks one minute before starting the first trial in each group to minimize any distracting effect. Subjects recorded whether they found the scent inherently positive, inherently negative, or if they were indifferent to it. Testers measured the length of time it took subjects to complete each of the six trials.

Here is the raw data:

15

16

17

18

11

12

13

14

Sub Sex Smoker Opinion Age

1 M N pos 23

2

3

F

M

Y

N neg pos

43

43

8

9

10

6

7

4

5

M

M

F

F

F

M

F

N

N

Y

N

N

N neg neg pos pos pos pos

N indiff

32

15

37

26

35

26

31

19

20

21

F

F

F

M

M

M

M

M

F

F

M

Y

Y

N indiff

N

Y

N

Y

N

N pos

Y indiff pos

Y indiff pos neg neg pos neg neg

54

38

65

25

26

33

62

35

55

25

39

75.1

57.6

55.5

49.5

40.9

44.3

93.8

47.9

Unscented Mask Times

38.4

27.7

25.7

46.2

72.5

57.2

57.9

41.9

51.9

38.0

82.8

33.9

50.4

35.0

32.8

60.1

38.0

57.9

32.0

40.6

33.1

26.8

53.2

32.2

64.7

31.4

40.1

43.2

33.9

40.4

75.2

46.2

56.3

63.1

57.7

63.3

45.8

35.7

46.8

91.9

59.9

54.1

39.3

45.8

58.0

61.5

44.6

35.3

37.2

39.4

77.4

52.8

63.6

56.6

58.9

67.3

75.5

41.1

52.2

28.3

74.9

77.5

50.9

Scented Mask Trials

53.1

30.6

30.2

54.7

74.2

43.3

53.4

56.7

42.4

49.6

53.6

51.3

44.1

34.0

34.5

59.1

37.4

48.6

35.5

46.9

26.4

25.1

87.1

34.4

44.8

42.9

42.7

24.8

25.1

59.2

70.1

60.3

59.9

43.8

126.6

41.8

53.8

26.0

45.3

55.8

58.6

44.0

47.8

36.8

43.1

52.8

44.3

42.2

48.4

32.0

48.1

33.7

42.6

54.9

64.5

The raw data is available in a JMP datafile called scent.jmp

available in the Sample Program Library at http://www.stat.sfu.ca/~cschwarz/Stat-650/Notes/MyPrograms .

11.10.1

Preliminary steps

The first step is to find the mean over the three replicates for each mask for each person.

No information has been lost - the averaging process reduces the within-individual variation among the measurements with and without a mask compared to only using a single measurement. If the design was unbalanced, i.e. the number of scented and unscented trials differed within a person, then analyzing c 2015 Carl James Schwarz

674

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS the average response for a person would only be approximate. If a variance decomposition showed that there was very little variation in the responses within a subject, then a design variant would be to measure each person only once under each mask type.

Then the two means will need to be stacked The resulting data table has a column for sex, a column for the level of the mask applied (scented or non-scented), a variable for the mean time to complete the maze, and a variable for the subject identification. Notice that a unique label is used for every subject.

The side-by-side dot plots are found in the usual way:

The sample standard deviations of the averaged data are also found:

There are no obvious outliers except perhaps in the Male-unscented group. The standard deviations for each treatment group appear to be approximately equal - with the possible exception again of the male-unscented group which may be affected by the potential outlier. This point should be investigated c 2015 Carl James Schwarz

675

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS in more detail.

There may be evidence of an interaction, but without the se bars, it is difficult to interpret this plot.

11.10.2

Building the model

This experiment has two factors. The first factor is sex with two levels (M and F). The second factor is mask type and has two levels (scented or unscented). There are four treatment combinations and all treatments occur in the experiment. The design is factorial.

There are two different sizes of experimental units and pseudo-replication (sub-sampling). The factor sex operates on subjects. Each subject is split into two “types” of maze drawing (scented or not scented).

Each type of mask is measured three times on each person.

Obviously sex cannot be randomized to subjects. In these analytical surveys, it is assumed that the people chosen are a random sample from the appropriate sex. The order of maze drawing and type of mask is randomized within each subject - all six runs are done in random order.

This is a split-plot design with sex applied to subjects and no blocking at the main plot level, the mask type applied at the subplot level, with pseudo-replication (sub-sampling) within each subject-mask combination. How would this experiment need to be run as a completely randomized design with and without pseudo-replication?

Some of the other variables measured could be used as covariates - this is beyond the scope of this course.

The model can be written as (after averaging over pseudo-replicates):

MeanTime = Sex Mask Sex*Mask Subject(Sex)(R) or because subjects are uniquely labeled, as:

MeanTime = Sex Mask Sex*Mask Subject(R)

Note that Subject is a random effect (why?).

This model is fit in the usual way.

11.10.3

Results

This gives rise to the following effect tests: c 2015 Carl James Schwarz

676

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

There is no evidence of an interaction between the effect of sex and mask type upon the mean response

( F = 0 .

54 , p -value = 0 .

47 ). We proceed to test the main effects of scent and sex.

H:

The null and alternate hypotheses are:

µ

A: µ scented scented

= µ unscented

= µ unscented or

H: the mean response time wearing the scented mask = mean response time wearing the unscented mask

A: the mean response time wearing the scented mask = mean response time wearing the unscented mask.

The F -statistic is .0962, the p -value is .7598. There is no evidence that the mean response time differs between the scented and unscented masks.

Similarly, there is no evidence of an effect of sex upon the mean time to complete a maze.

Even though we failed to find an effect, we still estimate the effect size to see if failure to detect an effect is because the effect is small (i.e. close to zero) with a reasonably small standard error or because the standard error of the effect size is so large so that nothing really can be concluded.

We start with the effect size of sex: c 2015 Carl James Schwarz

677

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

The estimated difference in the mean completion times between males and females is 2.65 ( se 5.52) seconds with a 95% confidence interval ranging from ( − 8 .

9 → 14 .

2 ) seconds. We are 95% confident that the difference in the mean response times lies in this interval. Because this interval covers zero, there is no evidence that the mean response times are different for the two sexes. However, notice that the se for the sex effect is quite large (almost 6 seconds) and so only very large differences are detectable.

Similarly, we investigate the effect size for mask type:

The estimated difference in mean completion time between the two mask types is 0.86 ( se 2.8) seconds.

The se of the estimated difference is small.

Notice that the standard error of the split-plot factor effect (mask type) is considerably smaller than the se for the main plot factor effects (sex). This is not uncommon. Comparisons among the levels of the sub-plot factor are “within” block comparisons (i.e. within the same subject). Consequently, subject-tosubject variability “cancels” out when comparison between the effects of the mask levels are made. The subject-to-subject variation does not “cancel” when comparisons are made between the two sexes.

The variance component estimates support the larger variance at the main plot vs. the sub-plot level: c 2015 Carl James Schwarz

678

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

11.10.4

Further work

A reviewer pointed out that as a person ages, their sense of smell deteriorates. Is it possible to re-run the experiment that would attempt to control for this effect?

One design would be to block by age. For example, you could sort male and female separately by age. Then for each block, choose one male and one female that are similar in age and create a “block” from these two individuals. In some cases, there is no obvious pairing, e.g. one of the males is 15 years old and it isn’t clear that this should be paired with a female of age 25+.

You could also create “generalized block designs” by making the age blocks much wider, e.g. 20’s,

30’s, etc. This design is no longer a simple RCB as there are multiple treatments (sexes) within each block. This design would also allow you to investigate if an age-by-sex interaction exists.

A reviewer also pointed out that smoking has a deleterious effect upon the sense of smell. A new experimental plan could investigate the effect of smoking status, sex, and the mask upon the response time. Smoking status would now be treated as a third factor. The design could be structured as a splitplot design that is very similar to the above. First for each smoking status - sex treatment combination select subjects. This part of the design would be run as a completely randomized design, i.e. randomize the order, and randomize over anything else that is not controllable. Each subject will perform the test both the scented and unscented mask (this is the split plot part). Each subject will perform the test three times (in random order) under both mask conditions (pseudo-replication).

Smoking status would be treated as a factor rather than a block because we are interested in investigating the size of the smoking effect.

Note that because smoking status could not be randomized to subjects, there may be a confounded variable that varies in a similar fashion to smoking status. Consequently, you would be unable to assign a cause-and-effect relationship.

In order to make sensible interpretations in this experiment, you will have to assume that smoking status is “randomized” over other variables, i.e. the self-selection to smoke is no influenced by some other variable.

11.11

Example - Pheromone effects upon wild type and anarchist colonies of bee

This is based on an experiment by S. Hoover, Biological Sciences, Simon Fraser University.

In normal honey-bee colonies, the queen is the main reproductive bee in a colony. Workers cannot mate, but they can lay unfertilized eggs, which develop into males if reared. Worker reproduction, while c 2015 Carl James Schwarz

679

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS common in queenless colonies, is rare in queenright colonies, despite the fact that workers are more related to their own sons than to those of the queen.

In some colonies, a rare behavioral syndrome, anarchy occurs, in which substantial worker production of males occurs in queenright colonies. The level of worker reproduction in these anarchic colonies is far greater than in a normal queenright honey-bee colony.

An experiment was conducted where different pheromones were applied to groups of workers from normal (wild type) and anarchist colonies of bees.

There were 4 anarchist colonies (all that existed at the time at SFU) and 4 wild type colonies selected at random from all the wild type present at SFU. A comb was removed from each colony, and incubated overnight. About 120 bees were taken from those that had emerged from the comb while it was in the incubator, and 30 were placed in cages, four separate cages from each colony. One of four types of pheromones were applied to each cage.

After the bees emerged, the bees were scored on ovary development scores (a scale from 0-4). Between 20-30 bees (a few died) were scored from the 8 × 4 = 32 cages.

The data is available in the beeovary.csv

file in the Sample Program Library at http://www.

stat.sfu.ca/~cschwarz/Stat-650/Notes/MyPrograms .

The raw data was imported into a JMP datafile called beeovary.jmp

. A portion of the data is reproduced below: c 2015 Carl James Schwarz

680

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

Notice how the Trt variable is constructed for the treatments (combinations of factor levels) in this experiment. The ordering of the rows in the datafile is NOT important.

11.11.1

Analysis of the mean scores

Here the individual bees are pseudo-replicates because the pheromones were applied at the cage level and not at the individual bee level. One way to deal with pseudo-replication is to average the response over the pseudo-replicates.

An average will only make sense if the actual values for ovary development can be ordered so that 0-

4 are increasing values of development, and if the “distance” between adjacent scores are approximately equal, i.e. the change from 0-1 in the ovary score is about the same as a change from 1-2 in the ovary score, etc. In more technical jargon, the ovary values must be stronger than simply ordinal scaled. Bee experts believe that this scale is appropriately ordered.

A summary table of the mean ovary score can be constructed using Tables> Summary as shown c 2015 Carl James Schwarz

681

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS earlier. The result is shown below.

Notice that the number of bees in the average ranged from 9 to 33. Because of the unequal sample size in the averages, the analysis on the average score will only be approximate. It is possible to do the analysis on the individual bee score – this is illustrated in the R and SAS analyses. Plots of the number of bees by colony type, or by treatment doesn’t show any obvious trend. Even though the original data is highly non-normal, the average will follow an approximate normal distribution because of the central limit theorem.

Plots of the mean ovary score did not show any outliers (not shown). Standard deviations were approximately equal in all treatment (combination of type and pheromone).

Model building

This is a two-factor experiment. The first factor is type of colony, normal (also known as wild type ( wt ), and anarchist . The second factor is the pheromone applied – there are four levels labeled ( Q+B , c , h , and l ).

There are two types of experimental units. The type of colony ( wt or an ) operates at the colony level.

c 2015 Carl James Schwarz

682

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

Then from each colony, four separate combs were extracted and placed in cages that were then exposed to the pheromone. Hence, the factor pheromone operates on the comb level.

Because colony type cannot be randomized to colonies, it is implicitly assumed that the colonies used are a random sample from all colonies of these types. In particular, there were only four anarchist colonies available at SFU, and it will be implicitly assumed that these are a random sample from all anarchist colonies of interest. The combs extracted from each colony were randomly assigned to the pheromone. It is implicitly assumed that the bees that died were a random selection from all bees, i.e.

missing data completely at random.

This experiment is a split-plot design with the main-plot factor being the type of colony (experimental unit is the colony) and the sub-plot factor being the pheromone (applied to the combs).

The multiple bees measured from each colony are the observational units and are NOT experimental units. This is a case of simple pseudo-replication that was covered earlier.

The response variable is an ovary development score and is discrete with values from 1-4. Obviously, the response is not normally distributed. However, because of the large number of bees measured in each cage, the mean score will be approximately normal.

The model for this experiment can be written as:

Mean(OvaryScore) = Type Colony(Type)(R) Pheromone Pheromone*Type where the terms Type , Pheromone and Type*Pheromone represent the treatment structure effects; the term Colony(Type)(R) represents the experimental unit for the Type factor and is a random effect because it is an experimental unit. There is no term for the comb as it is the smallest experimental unit in the study and is implicit. Because randomization occurred at all levels of the experiment, there are no terms for the randomization structure.

Because the colonies are all uniquely labeled, the model could be written as:

Mean(OvaryScore) = Type Colony(R) Pheromone Pheromone*Type

The model is fit in JMP in the usual way.

The effect tests are:

The effect tests show evidence of an interaction between the effects of Type and Pheromone ( p -value

.036). The profile plot confirms this: c 2015 Carl James Schwarz

683

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

It appears that the effect of the h pheromone is what is causing the interaction. [Indeed, if this pheromone is dropped, then no interaction is detected].

The LSmeans can be computed for each treatment combination and Tukey multiple comparison procedure applied:

The variance components are

From the variance component table, we see that colony-to-colony variation within each type is small relative to natural variation. This is encouraging as it implies that the colonies appear to be very similar c 2015 Carl James Schwarz

684

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS within each colony type.

11.11.2

Analysis of raw scores

As noted in an earlier chapter, the analysis of the mean scores from pseudo-replication does NOT lead to

a loss of information 10

and is EXACT in the case of balanced data. In the case of unbalanced data, there is still no loss of information, but the analysis is only approximate.

The model for the individual scores is :

OvaryScore = Type Colony(type)(R) Pheromone Pheromone*Type Pheromone*Colony(Type)(R) where the terms have the same meaning as before, but now the Pheromone*Colony(type)(R) term represents the random comb(cage) effects.

Because both colony and cage are uniquely labeled, an equivalent model is:

OvaryScore = Type Colony(R) Pheromone Pheromone*Type Cage(R)

The analysis of the individual observations follows and gives essentially the same p -values:

11.11.3

Analysis of ovary scores as a discrete response

It is very tempting to analyze this data using a Pearson chi-square test to see if the distribution of bees among the ovary scores is the same across treatments. However, this is NOT appropriate as this will be case of sacrificial pseudo-replication - the data from the colonies has been pooled within each combination of Type and Pheromone . It also does NOT take into account the split-plot structure of the experiment.

Consequently, the simple Pearson chi-square test is NOT appropriate.

The analysis of such data using discrete methods is beyond the scope of this course, please contact me for further details.

11.12

Repeated Measure Designs analyzed as a Split-Plot Analysis

As noted elsewhere in these notes, there are several different ways in which two-factor designs can physically be performed. It is important then to understand exactly how an experiment was run in order to perform the appropriate analysis.

10

The resulting means values are more precise than the individual values and hence variation in the responses has been reduced c 2015 Carl James Schwarz

685

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

In split-plot designs, there are two sizes of experimental units, traditionally called main-plots and sub-plots. In experiments done in BPK, a common split-plot design occurs when various tests are done on subjects of different genders. The factor gender operates on the main-plot level, while the tests operate within subjects.

This is also known as repeated-measures design, but as noted in

McCulloch, C.E. (2005).

Repeated Measures ANOVA, R.I.P?

Chance, 18, 29-33.

any repeated-measures design can be analyzed using random-effect mixed models and this latter analysis provides substantially more flexibility. For example, missing observations and different covariance structures can be easily handled – see me for more details.

You saw examples of this type of analysis earlier in the split-plot-in-time analyses.

11.13

Example - Holding your breath at different water temperatures - BPK

11.13.1

Introduction

How does the time that a subject can hold its breath vary by the temperature of the water in which the subject was immersed. Does it vary between males and females?

Several subjects of each sex were asked to hold their breath when their faces were immersed in water of various temperatures. The time (seconds) the subject was able to hold their breath was recorded. The height of the subject (m) was also recorded as a measure of size. Finally, the time to hold the breath when their face was not immersed in water, i.e. at ambient air conditions, was also recorded.

This data provided by Matthew D. White in BPK at SFU.

The goal of the study was to see if lower water temperatures decreased breath hold times. Biologists insist that lower water temperature prolongs breath hold time as seen in many diving mammals. But is this true? Consider working off the coast of Newfoundland where helicopter pilots and passengers wear whole body survival suits when flying over the cold North Atlantic Ocean. In the tragic event of a crash these suits leave the face exposed during underwater breath-hold swims from inverted and submerged helicopters. So the water temperature may affect the length of time that a person can hold their breath and their ability to escape safely from a submerged helicopter.

The hypotheses of interest are;

• Is there a difference in the mean time subjects can hold their breath at different water temperatures?

• Is there a difference in the mean time males and females can hold their breath?

• Is the difference between males and females consistent over the different water temperatures?

• Is the height of the subject (a measure of size) relevant in explaining some of the differences?

c 2015 Carl James Schwarz

686

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

The raw data is available in a JMP datafile called breath2.jmp

in the Sample Program Library at http://www.stat.sfu.ca/~cschwarz/Stat-650/Notes/MyPrograms . Part of the raw data is shown below after being stacked as outlined in previous chapters.

11.13.2

Standard split-plot analysis

It is always a good idea to plot the data to check for outliers and unusual points and to see if a transformation will be required. Here a plot shows the change in time to hold a breath as a function of water temperature (with the ambient air temperature arbitrarily assigned a value of 25

C).

We use the Analyze> Fit Y-by-X platform to create the profile plots as shown in previous chapters: c 2015 Carl James Schwarz

687

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS c 2015 Carl James Schwarz

688

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

In general, the time to hold one’s breath increases with increasing water temperature, and it is not evident that males and females are all that different. But, the plot shows two subject with unusual results.

Subject 6 seems to have an extraordinary ability to hold his/her breath – this subject’s change over time is not that different than the other subjects, but this subject appears to be an outlier in their respective gender’s ability. Subject 3 is most unusually in that the profile as temperature changes is clearly different than all of the other subjects.

In this case, it may be sensible to remove both subjects. Clearly subject 3 is “different” than all other subjects, and including this subject will lead to increase variance in the estimates making it more difficult to detect effects. Subject 6 is an outlier only in the gender dimension – the profile over different water temperature is similar in pattern to other subjects, but inclusion of this subject will affect the ability to detect gender effects.

As is the case with all outliers, it is suggested that you can rerun the analysis with the outliers included and with the outliers excluded to see if the results differ in a substantive fashion.

We remove subjects 3 and 6 using the Select Rows feature. Be sure to both exclude and hide the data points from subjects 3 and 6.

c 2015 Carl James Schwarz

689

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS c 2015 Carl James Schwarz

690

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

The revised profile plot is:

There doesn’t seem to be many odd points other than random scatter.

c 2015 Carl James Schwarz

691

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

There are two factors in this experiment – Gender and Temperature .

Gender has two levels, male and female.

Temperature has 6 levels – note that we treat the temperature as a FACTOR rather than as a continuous variable for the first analysis below. The code that read in the data performed the conversion to an R factor. Because all combinations of Gender and Temperature appear in the experiment, the treatment structure is a factorial . This implies that all main effects and interactions can be fit in the model.

There are two sizes of experimental units.

Gender “operates” on Subjects and we assume that the subjects used in this experiment are a random sample of people of each gender. We labelled the subjects from 1 to 12 rather than from 1 to 6 in each gender to avoid making the mistake of thinking that subject

1 of the males is the same as subject 1 of the females. This use of individual labels for each experimental unit is recommended as outlined in earlier chapters.

Temperature “operates” on parts of a subject’s lifetime in much the same way as split-plots in time have repeated measurements over time on the mainplot units. However, in these types of experiment, we are able to randomize the treatments (temperature) within each subject unlike split-plots in time when randomization of time is not possible.

There are two types of randomizations that occur in this experiment. First we assume that subjects are a random sample from each gender. Second, we randomize the order of the temperature levels within each subject making sure that every subject has every temperature level and every temperature level occurs in every subject. In this study, the order of temperatures was assigned in a randomized manner that was balanced across the two genders. Each subject’s order of temperatures was drawn from a ‘hat’, with one ‘hat’ being used for males and one ‘hat’ being used for ’females’.

Each subject was familiarized to breath holding at the control air temperature of 21

C. Also to account for any training effects, each subject had 3 successive breath hold trials at each face bath temperature. The mean of these 3 trails for each volunteer at each water temperature is given in this data

set.

11

Between each trial, at each water temperature, enough time was given to allow the subjects face temperature to return to a resting value.

If the order of the temperature levels was not randomized within each subject, this can lead to a more complex analysis. First, there is a conceptual difficulty that if the same temperature order was used for all subject, then you cannot statistically distinguish the temperature effect from the time order effect.

Perhaps the time that subjects can hold their breath simply increases over time because of practice rather than because of increasing temperature? There is also a subtle problem in the correlational structure when measurements are taken in the same time order. Now it is possible that the residual errors follow what is known as an AR(1) structure where where observations that are closer in time are more highly correlated than observations that are further apart in time. Please see me for details on how to deal with this AR(1) structure.

This design is an example of a split-plot repeated-measures design and should not be analyzed as a completely randomized design. The appropriate model (using the short hand syntax) is

Time = Gender Subject(Gender)(R) Temperature Gender*Temperature where the term Time represents the response variable; the term Gender represents the effect of the two genders; the term Subject(Gender)(R) represents the random effect of the main-plot experimental units to which gender is “applied”; the term Temperature represents the effect of the different water (and air) temperature; and the term Gender*Treatment represents the interaction effect of the effects of temperature and gender. Because all of the subject had unique labels, we could also specify the model as:

Time = Gender Subject(R) Temperature Gender*Temperature

11

These replicated readings for each trial are an example of pseudo-replication (sub-sampling) as outlined in earlier chapters. As shown in those chapters, using the average time over the three pseudo-replicates is the appropriate way to deal with this problem.

c 2015 Carl James Schwarz

692

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

We fit this model using the Analyze> Fit Model platform:

Don’t forget to make Temp a nominal scaled variable for this part of the analysis!

Also don’t forget to specify that subject(gender) is a random effect by choosing the term and selecting the appropriate attribute. You will also find that if the order of the variables in the interaction term is gender*temp rather than temp*gender , the profile plot (see below) is easier to interpret.

The effect tests: show no evidence of an interaction between the effects of gender and temperature ( p = 0 .

89 ). This implies that there is no evidence that the profile of the mean response for each gender are not parallel to each other, i.e. no evidence that the effect of gender differs at the different temperatures and vice versa.

There is strong evidence of an effect of temperature ( p < 0 .

001 ). This implies that the mean time that subjects can hold their breath differs across the different water temperature and air temperature (when c 2015 Carl James Schwarz

693

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS averaged over the two genders), but at this point, we don’t know where the mean time differs.

There is no evidence of a difference in the mean time subjects can hold their breath between the two genders ( p = 0 .

18 ) (when averaged over all times).

Let us examine the profile plot of the estimated mean response by gender and temperature.

If you find that your profile plot has a series of 5 lines, one for each temperature with gender on the bottom axis then you need to go back to the Analyze> Fit Model platform and change the order of the interaction term.

If is a pity that JMP doesn’t add standard error or confidence limits on the profile plot. There is no easy way to add them automatically in JMP .

The 95% confidence intervals for the mean response for the males and females overlap considerably at each temperature level even though there appears to be a consistent gap between the males’ and females’ mean response – the overlap in the confidence intervals explains why no gender effect was detected.

There is a gradual increase over the temperature levels, so it is likely difficult to detect differences in the mean response from neighboring temperature levels.

The curves are roughly parallel which is an indication of no interaction between the effects of gender and temperature.

c 2015 Carl James Schwarz

694

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

Despite not detecting a gender effect, it is still useful to estimate the size of the gender effect.

The estimated difference (averaged over all temperature levels) is about 9 ( se 7) seconds.

We should also explore the effects of temperature more closely to see where differences are detected.

A Tukey multiple-comparison of the effects of temperature (averaged over both genders) can be found.

c 2015 Carl James Schwarz

695

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

This largish table is difficult to interpret, but you can get all the pairwise comparisons with the estimated difference in the mean, standard error of the estimated difference in mean, and 95% confidence intervals for the difference in means.

A better display is the joined-line plots that we saw in previous sections.

c 2015 Carl James Schwarz

696

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

The letter grouping A indicates that the mean response in the air (denoted by temp 25

C) appears to be different than all other treatments because no other temperature group also has the letter A . However, it is very difficult to distinguish between the means for the actual water treatments. For example, the letter B indicates that it is difficult to distinguish all but the mean from water temperature 0

. Notice that because of the overlapping of the letters B and C , the interpretation of the temperature effects is difficult

– refer to the main notes in single factor designs (the Cuckoo example) for more details.

So the final conclusions are:

• No evidence of a gender effect. The effects are suggestive as the profile line for males is always above and parallel to that of females, but the noise in the data is large enough to hide any effect.

• Difficult to detect effects of different water temperatures. Clear evidence that the mean time to hold one’s breath while in water is different than in ambient air.

• No evidence of an interaction, i.e. the two profiles for males and female mean response could be parallel.

The diagnostic plots (residual plots): c 2015 Carl James Schwarz

697

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS c 2015 Carl James Schwarz

698

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS don’t show any major problems. There is some evidence of non-normality in the residuals, but is not serious.

11.13.3

Adjusting for body size

Some of the noise in the data may be due to body size – males are generally larger than females and there is a large variation in body sizes in each gender. Perhaps larger people have more lung volume and this influences how low they can hold their breath? Physiologists generally believe that pulmonary function is more a function of height than weight.

We use the height variable as covariate. The appropriate model (using the short hand syntax) is

Time = Gender Height Subject(Gender)(R) Temperature !

Gender*Temperature where the additional term Height represents the adjustment for height. Again, because each subject has c 2015 Carl James Schwarz

699

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS a unique label, this can be written as:

Time = Gender Height Subject(R) Temperature !

Gender*Temperature

We also fit this model using the Analyze> Fit Model platform by adding height to the effect box:

The effect tests: again show no evidence of an interaction between gender and temperature and the gender effect is much less “significant” than earlier seen.

Be careful not to fall into the trap of looking at the (non-significant) p -value associated with height and thinking that height had no influence. Each of the effect tests are “marginal” to other variables in the model. Consequently, the test for a height effect is performed after adjusting for the gender (and temperature and their interaction) effect, and the test for gender is performed after adjusting for height

(and temperature and their interaction) effect. As height and gender are partially confounded (i.e. females tend to have smaller heights than males), it is not surprising that both terms appear to be statistically not significant..

c 2015 Carl James Schwarz

700

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

We can produce profile plot adjusted for height differences – here the estimated mean time to hold your breath is evaluated at the mean height seen in the data.

We see that adjusting for height, removes much of the gender difference seen before. Of course as height is a good predictor for gender, this is not surprising that adjusting for height removes much of the gender effect!

Additional tables to compare the different levels of temperature can be found in the same way as before.

11.13.4

Fitting a regression to temperature

If we ignore the ambient air test, there looks like a steady progression in the mean time to hold one’s breath as temperature increases. Can we fit a line through these points and is the relationship the same for males and females?

It is tempting to fit a simple regression line to the data ignoring the split-plot structure of the data.

This will lead to incorrect inference as the data values within each subject are not independent of each other.

We need to extend the simple regression model to account for the correlated observations within each subject. This is known as the andom intercept model . In this new model, we add a random subject effect to the regression model. This random subject effect shift the observations from the same subject up or down across all temperature and “creates” the correlation. For example, if a person is much better than average in holding his/her breath, their response to different temperatures may be similar to another person, but the first person’s responses are shifted upward or downward – this is similar to the response of subject 6 which we discarded earlier.

The appropriate model (using the short hand syntax) is

Time = Gender Subject(Gender)(R) Temperature(C) Gender*Temperature(C) which is similar to the previous model except now that temperature is modelled as a continuous variable rather than a categorical variable as in the previous analyses.

We continue to use the dataset with subjects 3 and 6 removed, and also exclude the ambient air conditions as it is not comparable to the water treatments.

c 2015 Carl James Schwarz

701

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

25

The Analyze> Fit Model platform is again used. Don’t forget to exclude the data from temperature corresponding to ambient air conditions.

Don’t forget to change the scale of temperature to continuous.

The effect tests: again show no evidence of an interaction, i.e. there is no evidence that the regression lines are not parallel.

We can produce profile plot of the fitted line and the mean from the previous analysis.

Unfortunately, there is no simple way to do this in JMP . Below is a graph from SAS .

c 2015 Carl James Schwarz

702

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

The line seems to be a reasonable fit and there is no evident non-parallelism. The weak effect of gender is also seen because the 95% confidence intervals at each predicted mean on the lines overlap considerably.

We could also adjust for height as in the previous section, but this not done here.

11.13.5

Planning for future studies

From this analysis you can also get information on the variability of subjects within each gender and the random noise for the time to hold one’s breath for each subject which is needed to plan the study to determine the appropriate sample size to detect effects.

In the case of balanced designs, i.e. same number of subjects in each gender, and every subject tested the same number of times, some “simple” on-line programs are available to help plan studies. For example,

Lenth, R. V. (2006-12).

Java Applets for Power and Sample Size [Computer software].

Retrieved 2011-12-12 from http://www.stat.uiowa.edu/~rlenth/Power .

In the case of unbalanced designs, more complex methods are required – contact me for more information.

In previous chapters, we showed that there are four important pieces of information needed for a power/sample size determination.

c 2015 Carl James Schwarz

703

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

• Estimates of the effect sizes that are biologically important. This is often the hardest part of the planning exercise.

• Estimates of the variance components.

• The α level to use (typically α = 0 .

05 ).

• The required power (typically 80%) at α = 0 .

05 .

Estimates of Effect size You need to determine what is biologically important to detect for both factors. For example, how big of a difference in the mean time of holding their breath between males and females is biologically important? How big of a difference in the mean time between the different water temperatures is important to detect? This is often the hardest part of the study as the question of what is “biologically important” is often difficult to answer.

Suppose that after a long deliberation, you decide that it is important to detect a difference of about

10 seconds in the mean time to hold one’s breath between the two genders, and that a difference of 20 seconds between the time to hold one’s breath at 20

C and at 0

C is also important.

Start by constructing a table of the approximate MEAN times to hold the breath that you would observe in this experiment for each combination of the gender and temperature. For example, given the biologically important differences noted above one such table is:

Gender

Temp Male Female

0

5

10

15

20

20

25

30

35

40

10

15

20

25

30

Notice that the difference in each row between the mean for males and females is 10 seconds and that the difference in each column between the mean at 0

C and 20

C is 20 seconds. Because the differences in each row and each column are consistent, there is NO interaction effect between the two factors. If the differences were not consistent (i.e. interaction between the effects of gender and temperature existed), the computations for the size of the effect of gender and temperature (below) would not change, but now the effect size of the interaction would also need to be computed. Please see me for details.

Next, compute the row and column averages and find the STANDARD DEVIATION of the row and column averages. In the above table, the column averages are 30 = (20 + 25 + 30 + 35 + 40) / 5 and

20 = (10 + 15 + 20 + 25 + 30) / 5 . The STANDARD DEVIATION of the column averages is 7.1. This is the gender effect.

Similarly, the row averages are 15, 20, 25, 30, 35 and the STANDARD DEVIATION of the row averages is 7.9. This is the temperature effect.

These two standard deviations are the EFFECT SIZES needed in a power analysis. Note that different computer packages will compute an EFFECT SIZE in different ways, but they will all give you the same estimates of power/sample size. In most cases, start with the table of means as shown above.

Estimates of the variance components . Estimates of the variance components can be found by looking at past studies (for example, the dataset used above) or educated guesses. From the first analysis c 2015 Carl James Schwarz

704

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

(after removing the effects of the outliers), the subject variance component estimate is around 81 seconds and the residual variance is about 100 seconds as shown in the table below:

Choice of α level This is usually set at α = 0 .

05 or α = 0 .

10 .

Level of power desired There are no hard and fast rules, but usual rules of thumb are to aim for an

80% power at α = 0 .

05 and a 90% power at α = 0 .

10 .

Using Lenth’s routines We are set to try the power program from Lenth. Visit http://www.

stat.uiowa.edu/~rlenth/Power and select the Balanced ANOVA option.

This allows you select from some pre-determined experimental designs (including a split-plot with main plots in blocks) or your own model. We will use the model for the first experiment. Enter the model

(using the simplified syntax) but with ‘+’ signs between the terms. The terms can be in any order.

c 2015 Carl James Schwarz

705

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

Start by specifying a starting point for the size of the design. I started with the number of levels of gender is 2; the number of levels of temperature is 5; and the number of subjects IN EACH GENDER is 12.

These values can be changed later. Click on the F tests button.

This brings up the power computation box with default values for the effect sizes and variance components that we will need to modify:

Start by specifying the effect sizes for the (fixed) effects of gender, temperature, and gender*temperature.

Change these values to 7.1 (for the gender effect), 7.9 (for the temperature effect), and 0 (for the gender*temp effect). You can either use the slider or click on the small box and enter the values directly: c 2015 Carl James Schwarz

706

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

Next, specify the STANDARD DEVIATION for the variance components for the residual. These 81 = 9 and 100 = 10 respectively: subject(gender) and

The power to detect each effect is listed on the right side. With 12 subjects/gender (for a total of 24 subjects), the power to detect a difference in the means of 10 seconds between the two genders is about

65%; the power to detect a difference in the means of 20 seconds among the temperature levels is over

99%; and the power to detect an interaction is 5%. [Of course, we set up the planning assuming that there was no interaction, so this 5% is just the false positive rate.] You would not examine the power for the subject(gender) term.

c 2015 Carl James Schwarz

707

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

You can move the slider on the number of subjects/gender until the power to detect the 10 second difference in the means between the two genders is at least 80%. This occurs at 17 subjects/gender.

The power analysis made a number of guesses as to the likely size of the effects that are biologically important and the size of the variance components. You should try various settings to see how sensitive the results are to your choices.

In this case, the limiting feature of the design was the ability to detect a gender effect. It is often the case that the factor at the upper level of a split-plot design has the lowest power. Intuitively, you only have

17 × 2 subjects that provide information on the gender differences. The multiple readings within each subject don’t provide any information on the gender differences. However, you have 17 × 2 × 5 = 170 measurements for detecting temperature effects and so have a greater power to detect these effects.

A summary of the power computations (computed by SAS ) is subjects_per_gender

4

Effect gender gender*temp temp

Power Power Power

0.221

0.050

0.919

c 2015 Carl James Schwarz

708

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

6

8

10

12

14

16

18

20

22

24

26

28

30

Effect gender gender*temp temp

Power Power Power

0.345

0.050

0.993

0.458

0.558

0.644

0.717

0.777

0.826

0.866

0.897

0.921

0.940

0.955

0.966

0.050

0.050

0.050

0.050

0.050

0.050

0.050

0.050

0.050

0.050

0.050

0.050

1.000

1.000

1.000

1.000

1.000

1.000

1.000

1.000

1.000

1.000

1.000

1.000

and a plot of the power as a function of the sample size/gender is:

Lenth’s routines can only be used for balanced “simple” design. The SAS program for this example shows how Stroup’s method can be used for split-plot designs (indeed for any design).

c 2015 Carl James Schwarz

709

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

Are there alternate designs that might be of interest? In this case, a large number of subject is needed to detect the gender effect but the required number of subjects is overkill to detect the temperature effect.

Perhaps the cost of the experiment can be reduced by not requiring all subjects to do all temperature tests. For example, some subjects could do the 1st, 3rd, and 5th temperature level; other subjects do the

2nd, 4th, and 5th temperature level, etc. This would be known as an split-plot with an incomplete block design at the subplot level. Planning such a study and its analysis is beyond the scope of this course – see me for details.

11.14

Example - Systolic blood pressure before presyncope - BPK

11.14.1

Experimental protocol

The data for this experiment was provided by Claire Protheroe, a M.Sc. candidate in BPK at SFU.

Fifteen subjects took place in an experiment to measure their orthostatic tolerance, the time the individual can stand still and regulate their blood pressure until presyncope (the symptoms experienced before a faint). During presyncope patients experience light-headedness, muscular weakness, and feeling faint (as opposed to a syncope, which is actually fainting). In many patients, lightheadedness is a symptom of orthostatic hypotension which occurs when blood pressure drops significantly such as when the patient stands from a supine or sitting position

Each subject was measured on a tilt tests three times - one with a compression stocking, one with a placebo stocking, and another with a different placebo stocking. The subjects were randomized to stocking conditions on three different days.

For each test, the subject was subject to a 20 minute supine (lying on the back) period, followed by a 20 minute tilt period, followed by a 10 minute period of − 2 mmHg of lower body negative pressure

(LBNP), a 10 minute period of − 40 mmHg LBNP, followed by a 10 minute period of − 60 mmHg LBNP.

However, not all patients made it through the entire test before reaching the pre-syncope stage. As noted at http://advan.physiology.org/content/31/1/76.full

During LBNP, participants lie in a supine position with their legs sealed in aLBNP chamber at the level of the iliac crest. Air pressure inside the chamber is reduced by a vacuum pump, making the pressure inside the chamber less than atmospheric pressure. This causes blood to shift from an area of relatively high pressure (i.e., the upper body, which is outside the chamber) toward an area of relatively low pressure (i.e., the legs inside the chamber).

Without physiological compensations, blood is shunted away from the thoracic cavity and ultimately pools in the lower limbs and the lower abdomen. Normally, the body compensates by peripheral vasoconstriction and an increase in heart rate, which serve to maintain normal circulation. Inadequate physiological compensations in response to increasing negative pressure results in falling arterial blood pressure and, ultimately, syncope.

Systolic blood pressure was measured every 2 minutes for 20 minutes during the supine phase; again measured every 2 minutes during the tilt phase; and finally every 2 minutes during the LBNP phases until the patient ended the trial.

The raw data available on the Sample Program Library at http://www.stat.sfu.ca/~cschwarz/

Stat-650/Notes/MyPrograms .

Here is a (very small) snippet of the raw data for two patients.

c 2015 Carl James Schwarz

710

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

Condition

Supine

Tilt

-20mmHg

-40mmHg

-60mmHg

Presyncope

66

68

70

Time

56

58

60

62

64

48

50

52

54

42

44

46

32

34

36

26

28

30

38

40

Time Placebo 1

Subject 23395 29902

2 136 87

10

12

14

16

4

6

8

18

20

22

24

129

125

132

125

131

128

124

134

132

129

130

107

94

105

111

76

89

86

103

81

84

81

130

145

135

128

124

119

79

102

98

102

88

90

96

104

97

102

101

93

89

89

89

92

89

153

149

147

143

145

147

128

149

152

158

74.3

52.3

123

127

123

133

127

128

126

101

112

126

142

133

143

129

129

131

Experimental

23395 29902

114 106

114

114

115

119

121

118

120

122

124

113

113

104

112

106

116

94

103

104

101

107

106

109

100

97

91

93

86

95

89

103

99

103

100

89

88

114

106

120

114

116

111

114

115

131

127

117

122

124

113

128

Placebo 2

23395 29902

137 111

128

118

124

118

120

110

123

125

134

117

125

114

109

103

100

98

97

113

119

97

98

99

107

100

100

105

93

92

94

93

98

103

107

103

102

107

108

68.6

76.2

54.8

70.0

Patient 23395 started off with a systolic blood pressure (SBP) of 136 mmHg at the end of the second minute while supine wearing the Placebo 1 stocking, and the blood pressure varied over the next 18 minutes, with the final blood pressure (at minute 20) of 132 mmHg. Then the patient was tilted. At c 2015 Carl James Schwarz

711

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS minute 22 (2 minutes into the tilt) the SBP was 129 and at minute 40 the SBP was 102. The LBNP was applied. For some reason, the blood pressure was missing for this patient at minute 42, but the blood pressure increased and ended at 158 mmHg at minute 50. The LBNP was increased and at minute 60 the SBP was 145 mmHg. The LBNP was again increased. At minute 62 the SBP was 147. At this point blood pressure readings were terminated. At minute 74.3, the patient experienced presyncope.

This patient underwent similar testing under the Experimental and Placebo 2 conditions.

Patient 29902 underwent a similar protocol, but the SBP terminated at minute 54 under the Placebo

1 condition. This patient experience pre syncope at 52.2 minutes.

Whew – the data is quite large with over 1000 values.

The hypotheses of interest are;

• Is there a difference in the mean time to presyncope between the different treatments?

• Is there a difference in the mean blood pressure between the different phases.

The analysis of the time to presyncope was done in a previous chapter – now we will look at the changes in systolic blood pressure under the different phases and treatments.

11.14.2

Analysis

In this part of the experiment, changes in the mean systolic blood pressure in response to the different stockings and the the different phases (supine, tilt, − 20 mmHg, − 40 mmHg and − 60 mmHg) are of interest.

This is now a two-factor experiment with the two factors being treatment (the different stockings, 3 levels) and phase (5 levels). However, the experiment is not a simple completely randomize design – it is a variant of a split-plot design.

In split-plot designs, there are two different sizes of experimental units. Here the concept of an experimental unit is not clear – it is easiest to think of the experimental units as days or minutes. The treatment (stockings) are applied at the day level within each subjects visit. Subjects serve as blocks for this factor. Then within a particular day, the different phases are applied on a minute-by-minute basis.

Notice that the phases are always applied in the same order. This lack of randomization can lead to some

subtle problems in the analysis which we shall ignore for now.

12

Also notice that the blood pressure within each phase is measured multiple times (at two minute intervals). These are known as pseudo-replicates (or sub-samples) and cannot be treated as being independent. As shown in previous chapters, there are two ways to deal with the pseudo-replication – analyze the averages over the pseudo-replicates or deal with the individual observations using a more complex model. Not every subject had the same number of sub-samples taken and so the analysis on the averages will not be exactly the same as the analysis on the individual measurements but should be close enough.

Not all subjects finished all phases. We will be making the implicit assumption that this missing data is MCAR (Missing Completely at Random), i.e. the probability of missingness is unrelated to the response or any other covariate. Under MCAR, the missingness doesn’t cause any great problem in

12

The problem is that the residuals from the phase effects may be correlated over time and a more specialized covariance structure, the AR(1) covariance model, may be appropriate.

c 2015 Carl James Schwarz

712

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS this analysis – all that happens is that the standard error of some comparisons is increased because the effective sample size for comparisons over missing data has been reduced. We would be very worried if the missingness is not MCAR – for example, suppose that people with low blood pressure to begin with are less likely to complete all phases. In this case, only those people who tend to have higher blood pressure will make it to the final phases and so the estimates involving these later phases will be biased by having a non-random sample of subjects participating in the comparison. There is no statistical way to check for MCAR and this must be assessed based on an intimate biological knowledge of the experiment.

The raw data are available in the SystolicBloodPressure.csv

file in the Sample Program Library available at http://www.stat.sfu.ca/~cschwarz/Stat-650/Notes/MyPrograms .

The data have been imported into a JMP data file SystolicBloodPressure.jmp

also available in the

Sample Program Library available at http://www.stat.sfu.ca/~cschwarz/Stat-650/Notes/

MyPrograms .

The first few observations are:

A tabulation of the number of measurements in each combination of stocking and phase (not shown) reveals that there was only 1 measurement taken at the − 60 mmHg phase over all all subjects and treatments and so this phase will be ignored in subsequent analyses. Virtually all subjects were measured

10 times in the supine and tilt positions, but the number of measurements at the − 20 and − 40 mmHg varies by patients and stocking.

First we exclude the data with phase of − 60 mmHg: c 2015 Carl James Schwarz

713

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS and then use Tables> Summary to get the mean over the pseudo-replicates: c 2015 Carl James Schwarz

714

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

A dot and profile plot of the mean responses over the 3 × 4 combinations of treatment (stockings) and phase does not show any obvious outliers and the assumption of parallelism within subjects appears to be satisfied.

We use the Analyze> Fit Y-by-X platform only to get the profile and dot plots: c 2015 Carl James Schwarz

715

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS c 2015 Carl James Schwarz

716

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS which gives:

There doesn’t appear to be any odd responses by subjects over the treatments (refer to the example on holding breath for an illustration on when certain subject have odd responses).

The statistical model for this experiment is a split-plot design with main plots (the days when a stocking is worn) in blocks:

Y = Subject(R) Stocking Subject*Stocking(R) Phase Phase*Stocking where the Subject(R) term represents the blocking by subjects; the Stocking term represents the effect of the different stockings; the Subject*Stocking(R) term represents the experimental unit for the treatment

factor levels (the stockings) and NOT an interaction between subject and treatment

13

the Phase term represents the effect of the different phases within a day; and the Phase*Stocking term represents the interaction effects between the two factors.

This model can then be fit in the usual way.

In JMP , we use the Analyze> Fit Model platform:

13

A key assumption of blocking is no interaction between the block and treatment effects. The Subject*Stocking notation is only a way to reference the date when a stocking is worn by a subject.

c 2015 Carl James Schwarz

717

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

Be sure that subject variable has a nominal scale and that you assign the random attribute on the proper terms in the model.

This gives the following test statistics: c 2015 Carl James Schwarz

718

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

There is no evidence of a treatment (stocking) effect ( p = 0 .

43 ), nor of a treatment-phase interaction

( p = 0 .

13 ). There is strong evidence of a difference in the means among the different phases ( p <

0 .

0001 ).

To investigate where the differences in the mean systolic blood pressure exist among the different phases, we use a Tukey multiple-comparison procedure:

It might be easier to interpret the results from the Joined-Line plots that we’ve seen earlier: c 2015 Carl James Schwarz

719

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

The estimated marginal means (the average at each phase when averaged over all subjects and stockings move in the correct direction (i.e. the mean systolic blood pressure decreases as the experiment progresses), but the change is too small to be detectable except at the extreme ends of the experiment.

Diagnostic plots don’t show any obvious problems.

We start by saving the residuals to the table of means: and then looking at the distribution of the residuals: c 2015 Carl James Schwarz

720

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS c 2015 Carl James Schwarz

721

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

The analysis on the individual measurements within each phase using a more complex model come to the same conclusions.

11.14.3

Power and sample size determination

The determination of an appropriate sample size is more complex because of the need to account for the pseudo-replication. For example, the design choices are both in the number of subjects for each treatment (the stocking) but also in the number of minutes each phase should be measured.

If you are willing to stick with about 10 repeated measurements in each phase, then the information from the analysis on the averages can be used to construct a power analysis to determine the number of subjects needed in the same fashion as the How long can you hold your breath example. Consult those notes for details.

In order to take into account both aspects of planning, you will need information on the three variance components:

• Subject*treatment variance term which is the experiment unit (the day) error variation.

• Minute-to-Minute vacation term which represents how the individual minutes vary among the repeated measurements within a phase.

Both are available from the more complex model (i.e. using all of the data).

The actual power analysis is beyond the scope of this course – please contact me for details.

11.15

Final notes

• Two types of split-plots = two different models The split-plot design often comes in two variants

- where the main plots are assigned to treatments in a CRD or where the main plots are assigned to the treatments in a RCB. The two models are similar but different, reflecting the difference at the uppermost level. The traditional model syntax is:

CRD : Y = A MPEU(A)(R) B A*B

RCB : Y = BLOCK A BLOCK*A(R) B A*B where the termsm A , B , and A*B represent the main effects and interaction of factors A and B ; the term MPEU(A) represents the experimental unit structure for a CRD (don’t forget to make it a random effect); the terms BLOCK and BLOCK*A represent the experimental unit structure for an

RCB. The BLOCK terms may be declared as either a fixed or random effect; the BLOCK*A terms

MUST always be specified as a random effect. Note that the BLOCK*A term is NOT an interaction between blocks and factor A A. A key assumption of any blocked design is no interaction with other factors. The BLOCK*A term is used to represent the main plot experimental units – the combination of block and level of Factor A identifies the main plot unit. It is only for historical reasons that this term is used to represent the main plot unit.

If you label the main plot experimental units with unique labels, the model syntax simplifies to: c 2015 Carl James Schwarz

CRD : Y = A B A*B MPEU(R)

722

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

RCB : Y = A B A*B BLOCK MPEU(R) where the MP(R) term represents the (random) main plot experimental unit in both designs.

• Different precision for whole-plot and sub- plot comparisons Typically, the variation among whole plots is much larger than the variation among sub-plots. This implies that comparisons among whole-plot treatments are less precise than comparisons among sub-plot comparisons. Interestingly enough, the ‘average’ precision over all comparisons is exactly the same with or without the split-plot features - all that happens is that precision is shifted among the factors.

This has implications for the design of the experiment. Factors that have larger relative effects

(compared to the standard deviation) should be placed at the main-plot level. Factors that have smaller relative effects should be placed at the sub-plot level.

• Advantages of split-plot designs The primary advantage of a split-plot design arises when one experimental factor must be assigned to larger experimental units than another experimental factor.

For example, growth chambers can maintain the growing temperature for a large number of pots while it is easier to manipulate the moisture level on an individual pot basis.

• Disadvantages of split-plot designs primary disadvantage of a split-plot design is the increased complexity in analysis with many computer packages not giving correct se for some marginal means! In this modern age of computer, these errors should NOT be occurring, yet manufacturers of packages often fail to conduct adequate testing.

• Split-split-plot designs It is relatively straightforward to extend the splitting to two or more levels.

At each level of the design, an experimental factor is applied to a certain size of experimental units and sub-factors are applied to smaller sizes of experimental units. There is no great conceptual leap - the analysis simply becomes more tedious.

• How to recognize a split-plot design Virtually all designs where time is a factor should be analyzed as a split-plot in time design (or even better as a repeated-measures design).

Similarly, when one factor is a ‘location’, these are often split-plot designs in some guise.

• Perils of ignoring split-plot aspect in the analysis If you fail to recognize a split-plot design, the typical result is that your significance level for the main-plot factor is too small, i.e., you are more likely to commit a Type I error, while the significance level for the sub-plot factors are too large, i.e., you are more likely to commit a Type II error.

The reason for this is that typically the main plot error is larger than the sub-plot error. When you ignore the split-plot structure, the overall (incorrect) error term is a weighted average of the the two values and is too small for main-plots and too-large for sub-plots.

• Split-plot vs. repeated measure designs A repeated measures analysis allows the covariance structure among the repeated observations to be very general. For example, if the repeated measurements are across time, you would expect measurements that are close together in time would be more highly correlated than measurements far apart in time.

In a split-plot design, you are assuming that the correlation among measurements is the same regardless of how far apart they are. This is accomplished, in part, by completely randomizing at the sub-plot stage.

• Missing values and unbalanced designs

Because there are two different sizes of experimental units, missing values can occur at two levels.

First, entire main plots can be missing. As long as this occurs completely at random this does not present too much of a problem if the main plots are in a CRD. In this case, the analysis proceeds exactly as before. If the main plots are in an RCB, you now have incomplete blocks at the main plot levels. This is more difficult to analyze and you may need assistance but most modern statistical packages should be able to handle this without problems.

c 2015 Carl James Schwarz

723

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

If missing values occur at the sub-plot level, then the affected main plot now looks like an incomplete block design. Again, this is slightly more difficult to analyze and you may need assistance, but most modern statistical packages should be able to handle this without problems.

• Random factors in split plot If Factor A or Factor B is a random effect, then the analysis is not too much more complicated other than the requirement to indicate that the appropriate terms are random effects in the model. However, some packages will compute incorrect standard errors.

• Should BLOCKS be declared as random effects?

Because every block has every main plot treatment repeated exactly once, all main plot factor comparisons are “within” block comparisons.

Consequently, declaring blocks as fixed or random effects has NO impact on the hypothesis tests.

The only place were treating blocks as fixed or random has an impact is if missing values occur in some of the blocks and in estimates of the marginal mean. Please contact me for more details.

11.16

Frequently Asked Questions (FAQ)

11.16.1

Difference between CRD, RCB, and split plot

A student wrote:

I am unsure of the difference between a CRD, RCB, and a split-plot design. Could you please demonstrate how an experiment could be designed under these alternatives?

Here is a sample experiment and how it could be designed in different ways.

If you were fall overboard in the Strait of Georgia, it is most likely that you will die of hypothermia when your core body temperature drops below 30C. It is estimated that in the winter, you have to be rescued in 20 minutes or less or you will not survive. Upon being rescued, it is important to have your core body temperature restored to normal as quickly as possible.

Traditional methods for restoring core body temperature are (a) immersion in a warm bath or (b) wrapping in insulated blankets or (c) wrapping a new ‘space’ blanket that supposed reflects body heat back in. An experiment was conducted to investigate the relative merits of these proposals. Call the three methods a , b , c for lack of a better name.

How should such an experiment be conducted - upon graduate students, of course, who have volunteered for the experiment for a relative small monetary reward. Under the supervision of a medical doctor, the subject lies in a bath of ice-water for 15 minutes. At this point, the core body temperature is noted, and the warming method is applied. The time needed to restore core body temperature is noted.

You have a pool of subjects, of both sexes (M and F). Past literature shows that the two sexes may react differently to the methods.

Because of space and equipment limitations, you can only conduct three experiments in any particular day (at 10:00, 13:00, and 16:00) and can only work 3 days/week. You can conduct the experiment for 2 weeks.

The charts below show how you would plan this experiment as an CRD, RCB, split-plot with main plot factor in blocks, and as a split-plot with main plot factor as a CRD. Any fatigue factors for an individual that would ordinarily make it impossible to be tested multiple times within a day will be ignored.

c 2015 Carl James Schwarz

724

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

Note that in some of the designs, it may not be feasible to get 18 observations and keep the design balanced. In the following layouts, the notation xx-S-m refers to subject xx of sex S who received method m. For example, 2Ma would refer to the second male subject who receives method a.

CRD In this design, each subject is measured only once. Hence you will need 18 subjects in all.

To keep the design balanced, there will be 9 M and 9 F. We completely randomize over all 18 possible day-slots available.

Time

10:00

13:00

16:00

<---------- Week 1 --------> <------- Week 2 --------->

1 2 3 1 2 3

2-Mb

4-Fa

3-Mc

1-Fb

5-Fc

1-Ma

8-Fb

4-Ma

6-Fb

7-Fa

2-Fa

5-Mb

6-Mc

9-Fc

9-Mc

8-Mb

3-Fc

7-Ma

The model is:

Y = Sex Method Sex*Method

The terms Sex , Method , and Sex*Method represent the treatment structure – a two-factor complete factorial treatment structure. As the design is a CRD, there is only one size of experimental unit, every unit is only measured once, and there is complete randomization over the entire experiment. The model does not need an explicit term for the experimental unit. Because of complete randomization, no terms are needed for this aspect.

RCB In this design, each subject is again only measured once. However, you may believe that some other variable may contribute to the variation among the responses. For example, you may believe that week affects the responses or that time slot may be important.

If you block by week, then you cannot use all the time-slots. In a classical RCB, each treatment combination should only appear once within each block. There are 6 treatment combinations (2 sexes by 3 methods) to be randomly assigned to the 9 time slots within a week. Only 6 subjects of each sex are needed. One possible design is shown below:

Time

10:00

13:00

16:00

Block 1 Block 2

<---------- Week 1 --------> <------- Week 2 --------->

1

1-Ma

2

.

3

3-Mc

1

6-Fc

2

.

3

4-Fa

1-Fa

2-Fb

2-Mb

3-Fc

.

.

.

4-Ma

5-Mb

.

6-Mc

5-Fb

The model is:

Y = Sex Method Sex*Method Week

As before, the terms Sex , Method , Sex*Method represent the treatment structure – a two-factor complete factorial treatment structure. As the design is a RCB, a block term needs to be specified - the Week term. There is still only size of experimental unit, so no terms are explicitly needed for experimental units.

c 2015 Carl James Schwarz

725

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

You could also block by time slot, i.e., 10:00, 13:00 or 16:00. Then within each time slot, all 6 treatment combinations must appear once. You would need 18 subjects for this design.

Time

10:00

13:00

16:00

<----- Week 1 ----> <-- Week 2 ------>

1 2 3 1 2 3

1-Ma 3-Fc 2-Mb

6-Fc 4-Ma 5-Mb

7-Fc 7-Ma 8-Fb

3-Mc

5-Fb

8-Mb

2-Fb

4-Fa

9-Mc

1-Fa

6-Mc

9-Fa

<--- block 1

<--- block 2

<--- block 3

The model is now:

Y = Sex Method Sex*Method Slot

Again, the terms corresponding to the treatment structure are identical to the previous design. The blocking effect is now different.

It would not be correct to assign all males in week 1 and all females to week 2. This would then confound week and sex effects. If you decided to measure all people using all three methods, you would have a variant of a split-plot design.

Split-plot with main plots as a CRD Now each subject is measured using all three methods. We completely randomize over all 18 possible day-slots available. Only three subjects of each sex are required.

Time

10:00

13:00

16:00

<---------- Week 1 --------> <------- Week 2 --------->

1 2 3 1 2 3

1-Mb

2-Fa

1-Mc

1-Fb

2-Fc

1-Ma

3-Fb

2-Ma

2-Fb

3-Fa

1-Fa

2-Mb

2-Mc

3-Fc

3-Mc

3-Mb

1-Fc

3-Ma

The model is:

Y = Sex Method Sex*Method Id(Sex)-R

As before, the terms Sex , Method , and Sex*Method represent the treatment structure – a two-factor complete factorial treatment structure. There are now two sizes of experimental units – the subjects and the day-slots within each subject. The term Id(Sex) represents the experimental unit for sex - the person.

Or, you could test each person three times on a single day. This design is not as preferred as the above, as subject fatigue may be an important consideration. As well, now person to person variability is confounded with day-to-day variation. One layout is:

Time

10:00

13:00

16:00

<---------- Week 1 --------> <------- Week 2 --------->

1 2 3 1 2 3

1-Mb

1-Ma

1-Mc

1-Fb

1-Fc

1-Fa

2-Mb

2-Ma

2-Mc

2-Fa

2-Fc

2-Fb

3-Fc

3-Fb

3-Fa

3-Mb

3-Mc

3-Ma

The model is: c 2015 Carl James Schwarz

Y = Sex Method Sex*Method Id(Sex)-R

726

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

Note that it is impossible in the model to separate the day from the person effect.

Notice that in both cases what makes it a split-plot design is the fact that each person is measured three times using the three methods.

Split-plot with main plots as an RCB Again, each subject is measured three times. One possible way to block is by time-slot as in the earlier example of a pure RCB.

One possible layout is:

Time

<----- Week 1 ---> <-- Week 2 ------>

1 2 3

10:00 1-Ma 1-Fc 1-Mb

1

1-Mc

2

1-Fb

3

1-Fa <--- block 1

13:00 2-Fc 2-Ma 2-Mb

16:00 3-Fc 3-Ma 3-Fb

2-Fb

3-Mb

2-Fa

3-Mc

2-Mc

3-Fa

<--- block 2

<--- block 3

The model is:

Y = Sex Method Sex*Method Slot Sex*Slot-R

As before, the terms Sex , Method , and Sex*Method represent the treatment structure – a two-factor complete factorial treatment structure. There are now two sizes of experimental units – the subjects and the day-slots within each subject.

The term Slot represents the blocking by time slot.

Note that the Sex*Slot term is really just another way of identifying the person used in the experiment.

Once you know the slot of the experiment and the sex, you must know the person. As in all block designs, we assume that there is no interaction between blocks and any treatments - this is just a device to identify the person used in the experiment.

Again, note that the fact that each person is measured three times is what makes this a split-plot design.

11.16.2

Difference between CRD, RCB, and split plot

Here is another example comparing the 4 types of experiment designs for two factors:

There are two primary costs in fish farming - labor to tend the fish and feed for the fish. Naturally, fish farmers wish to maximize feeding efficiency, i.e., what fraction of the food fed to the fish is turned into sellable flesh.

Several factors control the growth of fish, many of which are uncontrollable by the fish farmer.

However, two factors can be easily controlled - the stocking density and the type of food. The BC

Association of Fish Farmers wishes to conduct an experiment to investigate these two factors.

There are three traditional stocking densities - for convenience call these high (H), medium (M), and low (L). There are two types of fish food - for convenience call these seal-based (s) and beef-based (b).

The association has two test sites off the west coast of Vancouver Island. At each site, there are

3 separate cages that have three internal partitions in each cage. The density and feed type can be manipulated separately for each partition of the cage.

c 2015 Carl James Schwarz

727

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS

Design the experiment as a CRD, RCB, split-plot with main plots in a CRD, and a split-plot with main plots as an RCB. Be sure to describe in sufficient detail how you randomized so that it is clear how each experiment was done. Note that for some designs, you many not be able to use all the partitions in all the cages. Use the following template when showing the experimental layouts.

Site 1 Site 2

Cage 1 Cage 2 Cage 3 Cage 1 Cage 2 Cage 3

Partition 1

Partition 2

Partition 3

Solution: CRD

There are 6 treatment combinations. Randomize these completely at random over the 18 cagepartitions among the two sites.

Partition 1

Partition 2

Partition 3

Site 1 Site 2

Cage 1 Cage 2 Cage 3 Cage 1 Cage 2 Cage 3

Hb

Hs

Hs

Mb

Lb

Lb

Ms

Hb

Ls

Hs

Mb

Hb

Ms

Ls

Mb

Lb

Ls

Ms

RCB

A reasonable blocking factor would be site. In a traditional RCB, each treatment combination appears only one in each block - hence there will be three vacant cage-partitions vacant in each block. Each block is randomized separately.

Partition 1

Partition 2

Partition 3

Site 1 Site 2

Cage 1 Cage 2 Cage 3 Cage 1 Cage 2 Cage 3

Hb

.

Hs

Mb

Ls

.

Ms

.

Lb

Hs

.

Mb

Ms

Hb

Ls

.

Lb

.

Split-plot design with main-plots as a CRD

This can be done in a number of ways. The main-plot experimental unit is typically the cage.

In this first solution, the stocking density is assigned completely at random to the six cages and the two food types are randomized to the partitions within a cage. The main-plot factor is stocking density, the sub-plot factor is food-type. Because there are only two food types, one partition in each cage is left blank.

Partition 1

Partition 2

Partition 3

Site 1 Site 2

Cage 1 Cage 2 Cage 3 Cage 1 Cage 2 Cage 3

Hb

.

Hs

Lb

Ls

.

Ls

.

Lb

.

Mb

Ms

Hs

Hb

.

Mb

Ms

.

In this second solution, the food type is randomized to the cage, and the stocking densities within c 2015 Carl James Schwarz

728

2015-08-20

CHAPTER 11. TWO-FACTOR SPLIT-PLOT DESIGNS each cage to the partitions. The main-plot factor is food type, the sub-plot factor is stocking density.

Partition 1

Partition 2

Partition 3

Site 1 Site 2

Cage 1 Cage 2 Cage 3 Cage 1 Cage 2 Cage 3

Hb

Mb

Lb

Ms

Hs

Ls

Ls

Ms

Hs

Lb

Hb

Mb

Hb

Lb

Mb

Hs

Ls

Ms

Split-plot design with main-plots in an RCB

This can be done in a number of ways. The main-plot experimental unit is typically the cage. The blocks are the sites.

In this first solution, the stocking density is assigned at random to a cage within each site. The two food types are randomized to the partitions within a cage. Each site has all stocking densities.

Partition 1

Partition 2

Partition 3

Site 1 Site 2

Cage 1 Cage 2 Cage 3 Cage 1 Cage 2 Cage 3

Hb

.

Hs

Lb

Ls

.

.

Mb

Ms

Ls

.

Lb

Hs

Hb

.

Mb

Ms

.

In this second solution, the food types are randomized to cages within each site and the stocking densities randomized within each cage. Because there are only two levels of food type, one cage is left vacant in each site.

Partition 1

Partition 2

Partition 3

Site 1 Site 2

Cage 1 Cage 2 Cage 3 Cage 1 Cage 2 Cage 3

.

.

.

Hb

Mb

Lb

Ms

Hs

Ls

Lb

Hb

Mb

Hs

Ls

Ms

.

.

.

c 2015 Carl James Schwarz

729

2015-08-20

Download