The design and analysis of experiments with a laboratory phase

advertisement
Principles in the design of
multiphase experiments with
a later laboratory phase:
orthogonal designs
Chris Brien1, Bronwyn Harch2, Ray Correll2
& Rosemary Bailey3
1University
of South Australia, 2CSIRO Mathematics,
Informatics & Statistics, 3Queen Mary University of London
http://chris.brien.name/multitier
Chris.brien@unisa.edu.au
Outline
1.
2.
3.
4.
5.
Primary experimental design principles.
Factor-allocation description for standard designs.
Single set description.
Principles for simple multiphase experiments.
Principles leading to complications, even with
orthogonality.
6. Summary.
2
1) Primary experimental design principles

Principle 1 (Evaluate designs with skeleton ANOVA
tables)
 Use whether or not data to be analyzed by ANOVA.



Principle 2 (Fundamentals): Use randomization,
replication and blocking (local control).
Principle 3 (Minimize variance): Block entities to form new
entities, within new entities being more homogeneous;
assign treatments to least variable entity-type.
Principle 4 (Split units): confound some treatment sources
with more variable sources if some treatment factors:
i.
ii.
iii.
require larger units than others,
are expected to have a larger effect, or
are of less interest than others.
3
Peeling et al. (2009)
A standard athlete training example


9 training conditions — combinations of 3 surfaces and 3
intensities of training — to be investigated.
Assume the prime interest is in surface differences
 intensities are only included to observe the surfaces over a range
of intensities.

Testing is to be conducted over 4 Months:
 In each month, 3 endurance athletes are to be recruited.
 Each athlete will undergo 3 tests, separated by 7 days, under 3
different training conditions.


On completion of each test, the heart rate of the athlete
will be measured.
Randomize 3 intensities to 3 athletes in a month and
3 surfaces to 3 tests in an athlete.
 A split-unit design, employing Principles 2, 3 and 4(iii).
4
2) Factor-allocation description for
(Nelder, 1965; Brien, 1983;
standard designs
Brien & Bailey, 2006)

Standard designs involve a single allocation in which a set
of treatments is assigned to a set of units:
 treatments are whatever are allocated;
 units are what treatments are allocated to;
 treatments and units each referred to as a set of objects;

Often do by randomization using a permutation of the units.
 More generally treatments are allocated to units e.g. using a spatial
design or systematically

Each set of objects is indexed by a set of factors:
 Unit or unallocated factors (indexing units);
 Treatment or allocated factors (indexing treatments).

Represent the allocation using factor-allocation diagrams
that have a panel for each set of objects with:

a list of the factors; their numbers of levels; their nesting
relationships.
5
Factor-allocation diagram for the
standard athlete training experiment

One allocation (randomization):
 a set of training conditions to a set of tests.
3 Intensities
3 Surfaces
9 training conditions

4 Months
3 Athletes in M
3 Tests in M, A
36 tests
The set of factors belonging to a set of objects forms a tier:
 they have the same status in the allocation (randomization):
 {Intensities, Surfaces} or {Months, Athletes, Tests}
 Textbook experiments are two-tiered.

A crucial feature is that diagram automatically shows EU
and restrictions on randomization/allocation.
6
Some derived items
3 Intensities
3 Surfaces
4 Months
3 Athletes in M
3 Tests in M, A
9 training conditions

36 tests
Sets of generalized factors (terms in the mixed model):
 Months, MonthsAthletes, MonthsAthletesTests;
 Intensities, Surfaces, IntensitiesSurfaces.

Corresponding types of entities (groupings of objects):
 month, athlete, test (last two are Eus);
 intensity, surface, training condition (intensity-surface combination).

Corresponding sources (in an ANOVA):
 Months, Athletes[M], Tests[MA];
 Intensities, Surfaces, Intensities#Surfaces.
7
Skeleton ANOVA
3 Intensities
3 Surfaces
4 Months
3 Athletes in M
3 Tests in M, A
9 training conditions
tests tier

E[MSq]
training conditions tier
source
df source
Months
3
Athletes[M]
8
Tests[MA]
36 tests
df  2  2  2
MAT MA M
1
3
3
3
Intensities
2
Residual
6
1
1
24 Surfaces
2
1
I#S
4
Residual
18
1
1
9
qI   
qS   
qIS   
Intensities is confounded with the more-variable
Athletes[M] & Surfaces with Tests[M^A].
8
Mixed model


Brien & Demétrio (2009).
Take generalized factors derived from factor-allocation
diagram and assign to either fixed or random model:
Intensities + Surfaces + IntensitiesSurfaces |
Months + MonthsAthletes + MonthsAthletesTests
Corresponds to the mixed model:
Y = XIqI + XSqS + XISqIS + ZMuM+ ZMAuMA+ ZMATuMAT.
where the Xs and Zs are indicator variable matrices for the
generalized factors in its subscript, and
qs and us are fixed and random parameters, respectively, with
E u j   0 and E u j uj    2j I.


This is an ANOVA model, equivalent to the randomization
model, and is also written:
Y = XIqI + XSqS + XISqIS + ZMuM+ ZMAuMA + e.
To fit in SAS must set the DDFM option of MODEL to
KENWARDROGER.
9
3) Single-set description

Single set of factors that uniquely indexes observations:
 {Months, Intensities, Surfaces}

e.g. Searle, Casella & McCulloch
(1992); Littel et al. (2006).
(Athletes and Tests omitted).
What are the EUs in the single-set approach?
 A set of units that are indexed by Months-Intensities combinations
and another set by the Months-Intensities-Surfaces combinations.
 Of course, Months-Intensities(-Surfaces) are not actual EUs, as
Intensities (Surfaces) are not randomized to those combinations.
 They act as a proxy for the unnamed units.

Mixed model is: I + S + IS | M + MI + MIS.
 Previous model: I + S + IS | M + MA + MAT.
 Former more economical as A and T not needed.
 In SAS, default DDFM option of MODEL (CONTAIN) works.

However, MA and MI are different sources of variability:
 inherent variability vs block-treatment interaction.

This "trick" is confusing, false economy and not always
possible.
10
Single-set description ANOVA

Single set of factors that uniquely indexes observations:
 {Months, Intensities, Surfaces}

Use factors to derive skeleton ANOVA
E[MSq]

2
 2  MI
 M2
source
df
Months
3
1
3
Intensities
2
M#I
6
1
1
3
3
Surfaces
2
1
I#S
4
Residual
18
1
1
9
qI   
qS   
qIS   
Confounding not exhibited, and need E[MSq] to see that
M#I is denominator for Intensities.
11
Summary of factor-allocation versus
single-set description




Factor-allocation description is based on the tiers and so is
multi-set.
It has a specific factor for the EUs and so their identity not
obscured.
Single-set description factors are a subset of those identified
in the factor-allocation description and so more economical.
Skeleton ANOVA from factor-allocation description shows:
 The confounding resulting from the allocation;
 The origin of the sources of variation more accurately
(Athletes[Months] versus Months#Intensities).
12
4) Principles for simple multiphase
experiments

Suppose in the athlete training experiment:
 in addition to heart rate taken immediately upon completion of a
test,
 the free haemoglobin is to be measured using blood specimens
taken from the athletes after each test, and
 the specimens are transported to the laboratory for analysis.

The experiment is two phase: testing and laboratory
phases.
 The outcome of the testing phase is heart rate and a blood
specimen.
 The outcome of the laboratory phase is the free haemoglobin.

How to process the specimens from the first phase in the
laboratory phase?
13
Some principles

Principle 5 (Simplicity desirable): assign first-phase units
to laboratory units so that each first-phase source is
confounded with a single laboratory source.
 Use composed randomizations with an orthogonal design.



Principle 6 (Preplan all): if possible.
Principle 7 (Allocate all and randomize in laboratory):
always allocate all treatment and unit factors and
randomize first-phase units and lab treatments.
Principle 8 (Big with big): Confound big first-phase
sources with big laboratory sources, provided no
confounding of treatment with first-phase sources.
14
A simple two-phase athlete training
experiment

Simplest is to randomize specimens from a test to
locations (in time or space) during the laboratory phase.
4 Months
3 Athletes in M
3 Tests in M, A
3 Intensities
3 Surfaces
36 locations
36 tests
9 training conditions
locations tier
36 Locations
tests tier
df source
df
Locations
35 Months
3
Tests[MA]
E[MSq]
training conditions tier
source
Athletes[M]
Composed
randomizations
8
source
df
2
2
 L2  MAT
 MA
 M2
1
1
3
1
1
3
3
Intensities
2
Residual
6
1
1
24 Surfaces
2
1
1
I#S
4
Residual
18
1
1
1
1
9
qI   
qS   
qIS   
15
A simple two-phase athlete training
experiment (cont’d)
locations tier
tests tier
source
df
Locations
35 Months
source
Athletes[M]
Tests[MA]



training conditions tier
df source
df
3
8
E[MSq]
2
2
 L2  MAT
 MA
 M2
1
1
3
1
1
3
3
Intensities
2
Residual
6
1
1
24 Surfaces
2
1
1
I#S
4
Residual
18
1
1
1
1
9
qI   
qS   
qIS   
No. tests = no. locations = 36 and so tests sources
exhaust locations sources.
Cannot separately estimate locations and tests variability,
but can estimate their sum.
But do not want to hold blood specimens for 4 months.
16
A simple two-phase athlete training
experiment (cont’d)
Simplest is to align lab-phase and first-phase blocking.

4 Months
3 Athletes in M
3 Tests in M, A
3 Intensities
3 Surfaces
tests tier
9
Locations in B
36 locations
df source
df source
Batches
3
Months
3
Locations[B]
32 Athletes[M]
8
Tests[MA]
Composed
randomizations
E[MSq]
training conditions tier
source

Batches
36 tests
9 training conditions
locations tier
4
df
2
2
2
BL
B2 MAT
MA
M2
1
9
1
3
9
qI   
Intensities
2
1
1
3
Residual
6
1
1
3
24 Surfaces
2
1
1
qS   
I#S
4
1
1
qIS   
Residual
18
1
1
Note Months confounded with Batches (i.e. Big with Big).
17
A simple two-phase athlete training
experiment – mixed model
4 Months
3 Athletes in M
3 Tests in M, A
3 Intensities
3 Surfaces
Batches
9
Locations in B
36 locations
36 tests
9 training conditions

4
Form generalized factors and assign to fixed or random:
 I + S + IS | M + MA + MAT + B + BL.

ANOVA shows us i) there will be aliasing and ii) model without lab
terms will fit and be sufficient – a “model of convenience”.
locations tier
tests tier
source
df
source
df source
Batches
3
Months
3
Locations[B]
32 Athletes[M]
8
Tests[MA]
E[MSq]
training conditions tier
df
2
2
2
BL
B2 MAT
MA
M2
1
9
1
3
9
qI   
Intensities
2
1
1
3
Residual
6
1
1
3
24 Surfaces
2
1
1
qS   
I#S
4
1
1
qIS   
Residual
18
1
1
18
The multiphase law


DF for sources from a previous phase can never be
increased as a result of the laboratory-phase design.
However, it is possible that first-phase sources are split
into two or more sources, each with fewer degrees of
freedom than the original source.
locations tier
tests tier
source
df
source
df source
Batches
3
Months
3
Locations[B]
32 Athletes[M]
8
Tests[MA]

E[MSq]
training conditions tier
df
2
2
2
BL
B2 MAT
MA
M2
1
9
1
3
9
qI   
Intensities
2
1
1
3
Residual
6
1
1
3
24 Surfaces
2
1
1
qS   
I#S
4
1
1
qIS   
Residual
18
1
1
DF for first phase sources unaffected.
19
Factor-allocation in multiphase
experiments

While multiphase experiments will often involve multiple
allocations, not always:
 A two-phase experiment will not if the first phase involves a survey
i.e. no allocation e.g. tissues sampled from animals of different
sexes.
 A two-phase experiment may include more than two allocations:
e.g. a grazing trial in the first phase that involves two composed
randomizations.

Factor-allocation description is particularly helpful in
understanding multiphase experiments with multiple
allocations.
20
5) Principles leading to complications,
even with orthogonality

Principle 9 (Use pseudofactors):
 An elegant way to split sources (as opposed to introducing
grouping factors unconnected to real sources of variability).

Principle 10 (Compensating across phases):
 Sometimes, if something is confounded with more variable first-
phase source, can confound with less variable lab source.

Principle 11 (Laboratory replication):
 Replicate laboratory analysis of first-phase units if lab variability
much greater than 1st-phase variation;
 Often involves splitting product from the first phase into portions
(e.g. batches of harvested crop, wines, blood specimens into
aliquots, drops, lots, samples and fractions).

Principle 12 (Laboratory treatments):
 Sometimes treatments are introduced in the laboratory phase and
this involves extra randomization.
21
A replicated two-phase athlete training
experiment

Suppose duplicates of free haemoglobin to be done:
 2 fractions taken from each specimen;
 One fraction taken from 9 specimens for the month and analyzed;
 Then, the other fraction from 9 specimens analyzed.
3 Intensities
3 Surfaces
9 training conditions

4
2
3
3
Months
Fractions in M, A, T
Athletes in M
Tests in M, A
72 fractions
2 F1 in M
4 Batches
2 Rounds in B
9 Locations in B, R
72 locations
Problem: 18 fractions in a month to assign to 2 rounds in a
batch:
 Use F1 to group the 9 fractions to be analyzed in the same round.
 An alternative is to introduce the grouping factor FGroups,
 but, while in the analysis, not an anticipated variability source.
22
A replicated two-phase athlete training
experiment (cont’d)
4
2
3
3
3 Intensities
3 Surfaces
2 F1 in M
4 Batches
2 Rounds in B
9 Locations in B, R
72 fractions
9 training conditions
locations tier
Months
Fractions in M, A, T
Athletes in M
Tests in M, A
fractions tier
72 locations
E[MSq]
training conditions tier
2
2
2
2
2
 BRL
 BR
 B2  MATF
 MAT
 MA
 M2
source
df source
df source
Batches
3
Months
3
1
2 18
1
Rounds[B]
4
Fractions[MAT]1
4
1
2
1
Locations[BR] 64 Athletes[M]
Note split
source
Tests[MA]
8
df
2
6
18
qI   
Intensities
2
1
1
2
6
Residual
6
1
1
2
6
24 Surfaces
2
1
1
2
qS   
I#S
4
1
1
2
qIS   
Residual
18
1
1
2
1
1
Fractions[MAT] 32
23
A replicated two-phase athlete training
experiment (cont’d)
training conditions tier
fractions tier
locations tier
df source
df
E[MSq]
2
2
2
2
2
 M2
 MA
 MAT
 B2  MATF
 BR
 BRL
source
df source
Batches
3
Months
3
1
2 18
1
Rounds[B]
4
Fractions[MAT]1
4
1
2
1
Locations[BR] 64 Athletes[M]
Tests[MA]
Fractions[MAT]


2
6
18
qI   
Intensities
2
1
1
2
6
Residual
6
1
1
2
6
24 Surfaces
2
1
1
2
qS   
I#S
4
1
1
2
qIS   
Residual
18
1
1
2
1
1
8
32
Last line measures laboratory variation.
Again, no. fractions = no. locations = 72 and so fractions
sources exhaust locations sources.
 Consequently, not all terms could be included in a mixed model.
 Pseudofactors not needed in mixed model (cf. grouping factors).
24
A replicated two-phase athlete training
experiment – mixed model
locations tier
training conditions tier
fractions tier
E[MSq]
2
2
2
2
2
 M2
 MA
 MAT
 B2  MATF
 BR
 BRL
source
df source
df source
Batches
3
Months
3
1
2 18
1
Rounds[B]
4
Fractions[MAT]1
4
1
2
1
Locations[BR] 64 Athletes[M]
Tests[MA]
Fractions[MAT]

df
2
6
18
qI   
Intensities
2
1
1
2
6
Residual
6
1
1
2
6
24 Surfaces
2
1
1
2
qS   
I#S
4
1
1
2
qIS   
Residual
18
1
1
2
1
1
8
32
Form generalized factors and assign to fixed or random:
 I + S + IS | M + MA + MAT + MATF + B + BR + BRL.

Removing aliased terms gives one “model of convenience”:
 I + S + IS | MA + MAT + B + BR + BRL (M, MATF omitted).
 Another model would result from removing B and BRL (need BR). 25
Single-set description for example
3 Intensities
3 Surfaces
4
2
3
3
Months
Fractions in M, A, T
Athletes in M
Tests in M, A
9 training conditions

Single set of factors that
uniquely indexes
observations:

9 Locations in B, R
72 locations
E[MSq]
2
2
2
 2  MR
 MIS
 MI
 M2
Source
df
Months
3
1
2
Rounds[M]
4
1
2
Intensities
2
6
18
1
2
6
6
1
2
6
Surfaces
2
1
2
qS   
I#S
4
1
2
qIS   
M#I[S]
18
1
2
Residual
32
1
3 Intensities x 3 Surfaces}. M#I
How do:
 the locations
sources;
 Fractions;
affect the response?
2 F1 in M
72 fractions
 {4 Months x 2 Rounds x

4 Batches
2 Rounds in B

2
qI   
Similar to model of convenience
26
6) Summary

Factor-allocation, rather than single-set, description used,
being more informative and particularly helpful in multiphase
experiments.

Multiphase experiments usually have multiple allocations.

Have provided 4 standard principles and 8 principles specific
to orthogonal, multiphase designs.

In practice, will be important to have some idea of likely
sources of laboratory variation.

Are laboratory treatments to be incorporated?

Will laboratory replicates be necessary?
27
References







Brien, C. J. (1983). Analysis of variance tables based on experimental
structure. Biometrics, 39, 53-59.
Brien, C.J., and Bailey, R.A. (2006) Multiple randomizations (with
discussion). J. Roy. Statist. Soc., Ser. B, 68, 571–609.
Brien, C.J. and Demétrio, C.G.B. (2009) Formulating mixed models for
experiments, including longitudinal experiments. J. Agr. Biol. Env. Stat.,
14, 253-80.
Brien, C.J., Harch, B.D., Correll, R.L. and Bailey, R.A. (2010) Multiphase
experiments with laboratory phases subsequent to the initial phase. I.
Orthogonal designs. Journal of Agricultural, Biological and
Environmental Statistics, submitted.
Littell, R. C., G. A. Milliken, et al. (2006). SAS for Mixed Models. Cary,
N.C., SAS Press.
Peeling, P., B. Dawson, et al. (2009). Training Surface and Intensity:
Inflammation, Hemolysis, and Hepcidin Expression. Medicine & Science
in Sports & Exercise, 41, 1138-1145.
Searle, S. R., G. Casella, et al. (1992). Variance components. New
York, Wiley.
28
Web address for link to Multitiered
experiments site
http://chris.brien.name/multitier
29
Download