L6_ch14_CRD

advertisement
Completely Randomized Design (CRD): ( Ch 14 O&L)
This is associated with homogenous units. Essentially all
the units we have are “alike” or “homogenous”. We
randomly assign treatments to the units.
CRD uses the tenets of Randomization and Replication in
terms of Design Theory. It does not need to make use of
local control as the units are homogenous.
The analysis of CRD for a single FACTOR is ONE WAY
ANOVA that we have already discussed earlier.
Most of this Chapter is review and it is a combination of
Lecture 2 (1 way ANOVA), Lecture 5 (Factorial
Structures).
Example:
We are interested in comparing three brands of cake
mixes for how much the cake rises. The Brands are:
Betty Crocker, Duncan Hines, and the Store Brand (say
IGA). We can afford to use 4 boxes of cake mixes for
each brand. How would we design this study?
So here we have a CRD with a single Factor and we can
use 1 way ANOVA to analyze this data.
What would the ANOVA table look like in this scenario?
Source Degrees of
Freedom
Sums of
Squares
Mean
Squares
F teststatistic
Brand 3-1=2
SSF
SSF/dff
SSF/SSE
Error
=12-3=9
SSE
SSE/dfe
Total
=12-1=11
TSS
pvalue*
Let us do a review of analyzing one way and factorial
designs in a Completely Randomized Experiment.
Examples of Completely Randomized Designs:
An experiment interested in the shelf life of meats stored under four (4)
different conditions: Commercial Plastic Wrap, Vacuum Packaged,
Mixed Gas Atmosphere and CO2. Twelve steaks were randomly
selected, with three steaks randomly assigned to each of the four
treatments.
Treatment structure is One-way.
ANALYSIS is ONEWAY ANOVA
Example2:
The variation in tensile strength of asphaltic concrete is thought to be
associated with the compaction method and the aggregate type. Four
compaction methods (static, regular, low and very low) were
considered along with two aggregate types (basaltic and siliceous).
Three replicates were constructed in random order for each of the
eight (8) treatments.
Treatment structure is Two-way (Compaction method and aggregate
type).
ANALYSIS is TWOWAY ANOVA WITH INTERACTIONS
How to Randomize?
1. Number the experimental units from 1 to r.
2. Generate a set of r random numbers (Random numbers can be
generated from a random number table)
3. Rank the numbers from smallest to largest. These ranks correspond
to experimental unit numbers.
4. Assign first group of replicates in sequence to the correspond
treatment A. Similarly, assign second group of replicates for treatment
B, etc.
Example2: Illustration:
Consider steak storage example in which there are four treatments
(Commercial Plastic Wrap, Vacuum Packaged, Mixed Gas Atmosphere
and CO2), each with three replicates.
Sequence
(Exp.Unit)
Random
number
Rank (unit
number)
Treatm
ent
1
448
4
A
2
699
7
A
3
340
3
A
4
733
9
B
5
580
6
B
6
852
12
B
7
514
5
C
8
723
8
C
9
152
2
C
10
744
10
D
11
828
11
D
12
041
1
D
Model and Assumptions are the same as we discussed in earlier
Lectures (cell means or treatment effects)
Graphical representation of the cell means model
Graphical representation of the effects model
Least Square Estimators:
The errors can be written as:
Since
eij
eij Yij i .
is unknown an estimator is of the form:
ˆij Yij 
ˆi
e
where ˆi represents an as yet to be determined estimator of i . It is
desired that the estimates of i have some optimal properties (e.g.,
unbiased, minimum variance, etc.). The method of least squares
produces such estimates of i (i = 1, 2, , t) and is based on
minimization of the sum of the squared errors:

2
ˆ
ˆ
S
S
E

e
Y






i
j
i
j
i
t r
t r
i

11
j
i

11
j
2
ANOVA Table
Source of
Variation
Degrees
of
Freedom
Sum of
Squares
Mean Square
F
Treatment t-1
s
SSTreatments
S
S
T
r
e
a
t
m
e
n
t
s
M
S
T
r
e
a
t
m
e
n
t
s
t
1
M
S
F Treatments
M
SError
Error
N-t
SS Error
S
S
E
rro
r
M
S
E
rro
r
Nt
Total
N-1
SSTotal
N =tr if the sample sizes for each treatment are the same.
t
N  ri if
i1
the sample sizes differ for the treatments.
Tests of Hypotheses
Recall that if SSTreatments is small, this would indicate that there is no
“important” difference between the various treatment means.
However, small must be interpreted in a statistical or probabilistic
sense. In the ANOVA table we alluded to the F statistic:
S
S
T
re
a
tm
e
n
ts
M
S
1
e
n
ts
F Treatm
 t
M
S
M
S
E
rro
r
E
rro
r
N

t
Expected mean squares:
EM
 SError2

t

2
2
E
M
S

r




T
r
e
a
t
m
e
n
t
s
i
i

1
Thus, the F - statistic, which is the ratio of the observed MSTreatment and
MS Error is approximately 1.0 if there is no treatment effect and greater
than 1.0 is there is a treatment effect.
This statistic represents a measure of the deviation from the null



0:
1
2 
t
hypothesis H
. The probability distribution of the F statistic is well known, with most elementary statistics texts having the
probabilities tabulated. The distribution of the F - statistic depends on
three parameters: 1 , 2 and  . The parameter 1 represents the
numerator degrees of freedom (t - 1), while 2 represents the
denominator degrees of freedom (N - t). The parameter  is known as
the non-centrality parameter and is zero when the null hypothesis is
true.
F Distribution with v and v degrees of freedom
Example 3:
The Environmental Protection Agency (EPA) utilizes the services of a
number of analytic laboratories. It is of interest to EPA that these
laboratories produce equivalent results when asked to analyze water
samples for possible contamination. In order to assess the quality of the
analyses from these laboratories the EPA decided to send each of 3
laboratories a set of 6 samples that were known to have a DDT
contamination level of 1000 ppm. Each water sample was constructed
independent of the others and randomly assigned to a laboratory. The
resulting data are given in the following table:
Laboratory
1
2
3
4
5
6
Mean
1
1005 1015 1033 1028 1023 1043
Y1. 1024.50
2
995 1008 976 1014 1011 982
Y2. 997.67
3
950
Y3. 988.33
975
988 1015 1008 994
Y.. 1003.5
S
S
9
1
8
1
.0; S
S
4
2
3
0
.0
; SSError 4950.0
T
o
ta
l 
T
r
e
a
tm
e
n
t
ANOVA Table
Source of
Variation
Treatments
Degrees of
Freedom
3-1=2
Sum of
Squares
4230.0
Mean
Square
2115.0
F0
6.4091
Error
Total
18 - 3 = 15
18 - 1 = 17
4950.0
9181.0
330.0
H


0:
1
2
3
Ha : i j for at
least one i  j
= 0.05
F0 6.4091
Reject
H0
if
F
F
3
.6
8
0
.0
5
,2
,1
5
0

Conclusion: Reject H 0 , there is sufficient evidence to conclude that the
mean estimated concentration of DDT differs between the three
laboratories.



P

v
a
l
u
e

P
F

F

P
F

6
.
4
0
9
1
0
.
0
0
9
7
0
v
,
v
2
,
1
5



1
2
Standard Errors and Confidence Intervals
Under the assumptions of the ANOVA model, the variance for each
treatment is assumed to be the same. This common, or pooled,
variance is estimated by the MSError . MSError S2
As stated earlier, the standard error of a sample mean is the standard
deviation of the sample divided by the square-root of the sample size.
For the ANOVA model, we use the square-root of the MSError as our
estimated standard deviation.
SEY
 i. 
S
r
Confidence intervals for the treatment means are:
Y
t
i. 



 , Nt
2

S
r
Example3
For the EPA laboratory data the treatment means, treatment sample
sizes and MSError are:
MSError
= 330.0 S = 18.17
N = 18
1
8
.1
7
Y
1
0
2
4
.5 r
6 S
E
7
.4
2
Y
1
.
1
1
.
6
Y
9
9
7
.6
7 r2 6 S
E
Y2.18.177.42
2
.
6
Y
9
8
8
.3
3 r36 S
E
Y3.18.177.42
3
.
6
95% Confidence Intervals for Treatment Means:
d
fN


t

1
8

3

1
5t
2
.
1
3
1
0
.
0
2
5
,
1
5


Y1. : 1024.50
± 2.1317.42
(1008.7 to 1040.3)
Y2. : 997.67 ±
2.1317.42
(981.9 to 1013.5)
Y3. : 988.33 ±
2.1317.42
(972.5 to 1004.1)
Unequal Number of Replicates
Suppose that the number of replicates per treatment differs. That is,
suppose that the number of observations for the ith treatment is
r
i1
,2
, ,t.
i
The following changes result from this:
Yij  eij 
i
1
,
2
, ,
t j
1
,
2
, ,
r
where ri
i
is the number of replications for
the ith treatment group.
Decomposition of the Sum of Squares:
tr
i
tr
i
2
tr
i
2
Y

Y

Y

Y

Y

Y












i
j
.
.
i
.
.
.
i
j
i
.
i

1
j

1
i

1
j

1
2
i

1
j

1
ANOVA Table
Source of
Variation
Degrees of
Freedom
Sum of
Squares
Mean
Square
Trt
t-1
S
S
T
r
e
a
t
m
e
n
t
i
1
SSError 
N-t
MS Error
ri
Y Y 
t
i1 j1
Total
2
ij
i.
SSTotal 
N-1
ri
Y Y 
t
i1 j1
1
r
u



where
t
2
t
i
t
1

i
1
2
i
Expected
Mean
Square
MSTreatments MSTreatments  2  t2
t r
t
i
MSError
2
2
Y

Y

rY
Y
i. .. 


ii.
..
i
1j
1
Error
F0
.
2
ij
t
. ri
i
..
t
ui
N

ri

and
N
i1
2
Standard Error of the Mean
S
EY
 i. 
S
r
i
Confidence Intervals
t
S
Y
t
 W
h
e
r
e
Nr


i
.
i



N

t

,
 r
i

1
2

 i
Example: Unequal Replication
An experiment was conducted to compare the yields of five lentil
varieties under rainfall conditions in northern Syria. The experiment
was planned in a completely randomized design with each variety
replicated four times. During the growing season, however, sheep
broke through fence and heavily grazed four plots along the edge of the
experiment before they were detected. The yields, in kilogram per
hectare, from the remaining plots are given in table below. The analysis
of variance for the data is given in table. The sums of squares in table
are computed as:
Variety
1
2
3
4
5
740
545
325
740
605
430
440
290
630
505
760
390
870
430
640

540
Sum
2570
1375
615
2240
2080
8880
ri
4
3
2
3
4
16
y
642.5
458.3
307.5
746.7
520.0
555.0
ANOVA Table
Source of
Degrees of
Sum of
Mean
Variation
Freedom
Squares
Square
Treatments
4
296,279
74,070
Error
11
126,421
11493
Total
15
422,700
F0
6.44
MINITAB OUTPUT One-way ANOVA: Yields versus Variiety
Source
DF
SS
MS
F
P
4
296279
74070
6.44
0.006
Error
11
126421
11493
Total
15
422700
Variiety
S = 107.2
R-Sq = 70.09%
R-Sq(adj) = 59.22%
Individual 95% CIs For Mean
Based on
Pooled StDev
Level
N
Mean
StDev
1
4
642.5
151.1
2
3
458.3
79.1
3
2
307.5
24.7
4
3
746.7
120.1
5
4
520.0
72.9
---+---------+---------+---------+-----(-----*-----)
(------*------)
(-------*--------)
(-----*------)
(-----*-----)
---+---------+---------+---------+-----200
Pooled StDev = 107.2
400
600
800
In CRD there are two types of models:
Fixed effects models and Random effects models (discussed earlier)
Examples:
Fixed: A scientist develops three new fungicides. His interest is in these
fungicides only.
Random: A scientist is interested in the way a fungicide works. He
selects, at random, three fungicides from a group of similar fungicides to
study the action.
One-way ANOVA Table for Fixed/ Random Effects (CRD):
Source of
Degrees of
Mean
Variation
Freedom
Square
Treatments
t-1
MSTreatments
Error
N= r*t
N-t
MS Error
Expected Mean Square
Fixed
Random
2 

2
r
i2
 2  r2
i
t 1
2
How Many Replications?
The required number of replications depends on the variance,
significance level, power of the test, and size of the difference to be
detected.
The power approach involves a trial-and-error approach to solving an
equation relating a noncentrality parameter to the number of
replications, the size of the difference to be detected, the number of
treatments, and the variance. The power resulting from using trial values
for the number of replicates can be determined from power charts. The
number of replicates can be changed until an acceptable power is found.
Your book has a section on it (14.6), but SAS does it easier now and we
will use the SAS way in LAB.
Download