Statistical Analysis & Design in Research

advertisement
Statistical Analysis & Design in Research
Structure in the
Experimental Material
PGRM 10
Statistics
in
Science

Blocking – the idea
Detecting differences between treatments depends
on the background noise (BN)
• BN is:
– caused by inherent differences between the
experimental units
– measured by the residual (error) mean square RMS
(alternatively! MSE)
• Comparing treatments on similar units would
reduce background noise
• With blocks of units of differing contributing
characteristics we measures the variation due to
blocks and reduce residual variation
Statistics
in
Science

Blocking – the benefit
Reducing background noise:
• Gives more precise estimates
• Allows a reduction in replication, without loss of
power
(the probability of detecting an effect of a specified
size)
• Reduces cost!
Statistics
in
Science

Blocking and experimental material
Examples
1. A field: with fertility increasing from top to bottom
With 3 treatments group plots into BLOCKS of 3,
starting at top and continuing to bottom.
Randomise treatments within each block
Statistics
in
Science

Block Design
Statistics
in
Science

Blk
1
A
T1
Treat
B
T3
2
T3
T2
T1
3
T2
T1
T3
4
T1
T2
T3
5
T3
T1
T2
6
T1
T2
T3
C
T2
What is the
experimental unit?
How many replicates
per treatment?
What is the block?
Example
• 2 drugs (A, B) to control blood pressure
• 100 subjects – randomly assign 50 each to A and B
• Valid - but is it efficient?
• If subjects are heterogenous - likely to be a large variation
(2) in the responses within each group.
• Design may not be very efficient.
Statistics
in
Science

Factors affecting BP variation
Statistics
in
Science

Blocking and experimental material
1. 100 subjects are selected to compare new drug to
control BP with a Control
Block into pairs by age & weight (believed to affect
BP)
In each pair one is selected at random to receive the
new drug, the other receives Control
Alternatively – see next slide
Statistics
in
Science

Groups (Blocks)
Age
>50
>50
>50
Statistics
in
Science

Sex
Male
Male
Male
Weight #
H
15
N
11
L
12
>50 Female
>50 Female
>50 Female
H
N
L
11
9
13
<50 Male
<50 Male
<50 Male
H
N
L
7
2
5
<50 Female
<50 Female
<50 Female
H
N
L
4
8
3
Total
100
T1
T2
Groups (Blocks)
Age
>50
>50
>50
Statistics
in
Science

Sex
Male
Male
Male
Weight #
H
15
N
11
L
12
T1
8
5
6
T2
7
6
6
>50 Female
>50 Female
>50 Female
H
N
L
11
9
13
5
5
6
6
4
7
<50 Male
<50 Male
<50 Male
H
N
L
7
2
5
4
1
2
3
1
3
<50 Female
<50 Female
<50 Female
H
N
L
4
8
3
2
4
2
2
4
1
Total
100
50
50
Blocking and experimental material
Examples
1. A field: with fertility increasing from top to bottom
With 3 treatments group plots into BLOCKS of 3,
starting at top and continuing to bottom.
Randomise treatments within each block
2. 100 subjects are selected to compare new drug to
control BP with a Control
Block into pairs by age & weight (believed to affect
BP)
In each pair one is selected at random to receive the
new drug, the other receives Control
3. 3 products to be compared in 15 supermarkets:
All 3 compared in each supermarket, regarded as
BLOCKS
Statistics
in
Science

Blocking and experimental material
Examples (contd)
4. A crop experiment will take 5 days to harvest.
The material is blocked into 5 sets of plots, and
treatments assigned at random within each set
A BLOCK of plots is harvested each day
Here: day effects, such as rain etc will be
allowed for in the ANOVA table, not clouding the
estimation of treatment effects, and reducing
residual variation.
Statistics
in
Science

Blocking factors in your work area?
Statistics
in
Science

Reasons to BLOCK
1. Reduce BN (as above)
2. Material is naturally blocked (eg identical twins)
so using this a part of the design may reduce BN
3. To protect against factors that may influence the
experimental outcomes, and so cloud comparison
of treatments
4. To assess block variation itself
eg day to day variation large may indicate a
process that is not well controlled.
Statistics
in
Science

Typical Randomised Block Design (RBD)
Layout
4 treatments T1 – T4

BLOCKS of size 4
Example of random allocation within blocks:
Block
Statistics
in
Science

1
T3
T1
T2
T4
2
T2
T3
T1
T4
3
T1
T2
T3
T4
4
T2
T4
T1
T3
5
T4
T2
T3
T1
6
T3
T1
T4
T2
ANOVA table
each treatment occurs once in each block
t treatments
b blocks
tb experimental units
Source
DF
SS
MS
F
Treatments
t–1
TSS
TMS
TMS/RMS
Small?
Blocks
b–1
BSS
BMS
BMS/RMS
Small?
(t-1)(b-1) RSS
RMS
Residual
Total
Statistics
in
Science

tb - 1
MS = SS/DF
Pr > F
Example
PGRM pg 10-2
Compare effect of washing solution used in retarding
bacterial growth in food processing containers.
Only 3 trials can be run each day, and temperature is not
controlled so day to day variability is expected.
BLOCKS: day
Treatments: 2%, 4%, 6% of active ingredient
Randomisation: 3 containers randomly allocated to 3
treatments on each of 4 days.
Response: bacterial count on each container each day
(low score = cleaner)
Statistics
in
Science

Example (contd)
E
x
c
e
l
Statistics
in
Science

Day
Solution(%)
Count
1
2
13
1
4
10
1
6
5
2
2
18
2
4
20
2
6
6
3
2
18
3
4
17
3
6
7
4
2
30
4
4
31
4
6
10
Day,Solution(%),Count
1,2,13
1,4,10
1,6,5
csv
2,2,18
2,4,20
...
Note:
Response values in a
single column
Extra column to identify
BLOCK (day)
TREATMENT (solution)
SAS GLM code
proc glm data = randb;
class solution day;
model score = solution day;
lsmeans solution;
lsmeans day;
estimate ‘2-6’ solution 1 0 -1;
estimate ‘linear ok?’ solution
1 -2 1;
Statistics
in
Science

quit;
GLM OUTPUT: ANOVA
Sum of
DF Squares
Source
Mean Square
Model
5
748.08
149.6
Error
6
76.8
12.8
Corrected Total
11
824.9
Source
Type I SS Mean Square F Value Pr > F
DF
in
Science
11.68
0.0048
solution
2
425.17
212.58
16.60
0.0036
Day
3
322.92
107.64
8.41
0.0144
425.17 + 322.92 =
748.09
Statistics
F Value Pr > F

So the Model SS has
been partitioned into
TREATMENT (solution)
and BLOCK (Day)
GLM OUTPUT: means
solution score LSMEAN
2
19.75
4
19.5
6
7.0
Parameter
2-6
linear ok?
Statistics
in
Science

Standard
Error t Value Pr > |t|
Estimate
12.75
2.530
5.04
0.0024
-12.25
4.383
-2.80
0.0314
ANOVA table
Source
SS
Days
425
Solution
Residual
2
19.8
1
9.3
Statistics
in
Science

F
P
? 213
18.60
0.004
323
? 108
8.41
0.014
76.8
? 12.8
Solution
4
19.5
Day
2
14.7
df
MS
6
7.0
SED
2.53
3
14.0
4
23.7
SED
2.92
More Blocking – Latin square designs
Statistics
in
Science

Latin Square design – blocking by 2 Sources of
variation
Variation in milk yield among
cows is large (CV% = 25)
Lactation yield pattern
Variation in Yield across
lactation is large
600
Yield (kg)
Use different treatments in
sequence on each cow
800
400
200
Need to allow for a
standardisation period (12) weeks between
treatments
Statistics
in
Science

0
0
2
4
6
Month
8
10
Data
Period
1
2
3
4
1
T2
T4
T3
T1
Cow
2
T1
T2
T4
T3
3
T3
T1
T2
T4
Milk yield (kg/day)
Cow
Period
1
2
3
1
9.7
14.0 20.2
2
15.1 20.3 17.8
3
16.4 20.1 21.3
4
11.8 19.1 21.3
Statistics
in
Science

4
T4
T3
T1
T2
4
20.9
24.3
21.5
20.6
Period
1
2
3
4
1
2
….
Cow
1
1
1
1
2
2
Treat
2
4
3
1
1
2
yield
9.7
15.1
16.4
11.8
14.0
20.3
Columns for
period,cow and
treatment codes
SAS GLM code
proc glm data = latinsq;
class period cow treat;
model yield = period cow treat;
lsmeans treat;
lsmeans period;
lsmeans cow;
estimate ‘1v2’ treat 1 -1 0 0 ;
Run;
Statistics
in
Science

Results
Source
Period
Cow
Treat
Error
DF
3
3
3
6
SS
31.2
165.8
32.5
7.2
MS
10.41
55.28
10.82
1.20
F
8.68
46.06
9.01
P
0.013
0.000
0.012
Cow and Period removed much variation
Statistics
in
Science

Means
1
2
3
4
SED
Treat
16.28
17.98
20.01
19.33
0.775
Period
16.21
19.37
19.82
18.18
0.775
Cow
13.24
18.38
20.16
21.82
0.775
Conclusions on Latin square design
CV greatly reduced to 6% - When the effect of period is
allowed for, repeated measurements within a cow are
not very variable.
Periods and cows are nuisance variables. Sometimes
the row and column variables are of interest in
themselves and so design is very efficient –
information on 3 factors. (e.g. treatments, machines,
operators).
Useful for screening but questionable whether short
term results would apply for the long term.
Statistics
in
Science

Download