The Completely §8.3) Randomized Design (

advertisement
The Completely
Randomized Design (§8.3)
• Introduction to the simplest experimental design - the
Completely Randomized Design.
• Introduce a statistical model for the observations in a
completely randomized design.
Completely Randomized Design
Two different Names for the Same Design:
• Experimental Study - Completely randomized design (CRD)
• Sampling Study
- One-way classification design
Randomization: The t treatments are randomly allocated
to the experimental units in such a way that n1 units
receive treatment 1, n2 receive treatment 2, etc.
Assumptions:
• Independent random samples (response from one experimental
unit does not affect responses from other experimental units).
• Responses follow a normal distribution.
• Common true variance, s2, across all groups/treatments.
• True mean for population i is mi.
• Interest is in comparing means.
AOV Model of Responses/Effects
Model:
yij  m   i   ij  m i   ij
random error ~ N(0,s2)
overall mean
E ( yij )  m   i  m i
ˆi
yi  mˆ  
effect due to population i
Expected response
Estimate
Requirement for m to
be the overall mean:
t

i 1
H 0 : 1   2     t  0
H a : At least one of the  i differs from 0
All i = 0 implies all groups have the same mean (m)
i
0
Example
A manufacturer of concrete bridge supports is interested in
determining the effect of varying the sand content on the
strength of the supports. Five supports are made for each
of five different amounts of sand in the concrete mix and
each is tested for compression resistance.
Percent Sand
15
20
25
30
35
7
17
14
20
7
7
12
18
24
10
10
11
18
22
11
15
18
19
19
15
9
19
19
23
11
Basic Statistics and AOV Effects
Percent Sand
15
20
25
30
35
7
17
14
20
7
7
12
18
24
10
10
11
18
22
11
15
18
19
19
15
9
19
19
23
11
MEAN
9.6
15.4
17.6
21.6
10.8
15
EFFECT
-5.4
0.4
2.6
6.6
-4.2
0
ˆ i  yi  y

Sum of Effects
m̂
Overall
Mean
Decomposing
the Data
y ij  m  i  ij
m = overall mean
i = mi – m = group i effect
ij = yij – m – i = residual
(Note that sum of residuals
for each treatment is zero)
Sum of
squares
Treatment Resistance Overall Mean Effect
15
7
15
-5.4
15
7
15
-5.4
15
10
15
-5.4
15
15
15
-5.4
15
9
15
-5.4
20
17
15
0.4
20
12
15
0.4
20
11
15
0.4
20
18
15
0.4
20
19
15
0.4
25
14
15
2.6
25
18
15
2.6
25
18
15
2.6
25
19
15
2.6
25
19
15
2.6
30
20
15
6.6
30
24
15
6.6
30
22
15
6.6
30
19
15
6.6
30
23
15
6.6
35
7
15
-4.2
35
10
15
-4.2
35
11
15
-4.2
35
15
15
-4.2
35
11
15
-4.2
SSQ
6275
5625
486.4
Residual
-2.6
-2.6
0.4
5.4
-0.6
1.6
-3.4
-4.4
2.6
3.6
-3.6
0.4
0.4
1.4
1.4
-1.6
2.4
0.4
-2.6
1.4
-3.8
-0.8
0.2
4.2
0.2
163.6
Decomposing Sums of Squares
 y  y
2
ij
i
Treatment
15
15
15
15
15
20
20
20
20
20
25
25
25
25
25
30
30
30
30
30
35
35
35
35
35
SSQ
2

j
Resistance Overall Mean
7
15
7
15
10
15
15
15
9
15
17
15
12
15
11
15
18
15
19
15
14
15
18
15
18
15
19
15
19
15
20
15
24
15
22
15
19
15
23
15
7
15
10
15
11
15
15
15
11
15
6275
5625


 ni   ni yi  y   yij  yi 
i
Effect
-5.4
-5.4
-5.4
-5.4
-5.4
0.4
0.4
0.4
0.4
0.4
2.6
2.6
2.6
2.6
2.6
6.6
6.6
6.6
6.6
6.6
-4.2
-4.2
-4.2
-4.2
-4.2
486.4
2
i
Residual
-2.6
-2.6
0.4
5.4
-0.6
1.6
-3.4
-4.4
2.6
3.6
-3.6
0.4
0.4
1.4
1.4
-1.6
2.4
0.4
-2.6
1.4
-3.8
-0.8
0.2
4.2
0.2
163.6
2
i
j
SSW
SSB
6275.0
-5625.0
=650.0
-486.4
=163.6
-163.6
=0.0
TSS
SSB
SSW
Compression Resistance
Resistance (10,000 psi)
30
25
20
m̂
15
10
5
0
10
15
20
25
Percent Sand
30
35
40
Resistance (10,000 psi)
Compression Resistance
30
25
20
15
̂4
̂1
10
5
0
10
20
30
40
Percent Sand
Best Treatment?
Is 30% significantly better than 25%?
Estimation
yij  m  i  ij
t
mˆ  y 
ni
 y
ij
i 1 j 1
t
n
i 1
i
mˆ i  yi  mˆ  ˆ i
̂ i  yi  y
Reference Group/Cell Model
Model:
ytj  mt  tj
it
yij  mt  i  ij
i  1, 2,
,t 1
random error ~ N(0,s2)
reference group
mean
effect due to population i
Mean for the last group (i=t) is mt.
Mean for the first group (i=1) is mt + 1
Thus, 1 is the difference between the
mean of the reference group (cell) and the
target group mean. Any group can be the
reference group.
H0 : 1  2    t 1  0
This is the
model SAS
uses.
Ha : At least one of the  differ from 0
All i = 0 implies all groups have the same mean.
Basic Statistics and Reference Cell Effects
Percent Sand
15
20
25
30
35
7
17
14
20
7
7
12
18
24
10
10
11
18
22
11
15
18
19
19
15
9
19
19
23
11
MEAN
9.6
15.4
17.6
21.6
10.8
10.8
EFFECT
-1.2
4.6
6.8
10.8
0
21
ˆ i  yi  yt

i
i
m̂ t
Reference
Cell Mean
 0 Sum of Effects
Reference Cell
Decomposition
Note: Sums of
squares don’t
quite add up.
Due to fact that
sum of i is not
zero.
6275.0
-2916.0
=3369.0
-927.4
=2441.6
-163.6
=2278.0
Treatment Resistance Group Mean Reference Cell Mean
15
7
9.6
10.8
15
7
9.6
10.8
15
10
9.6
10.8
15
15
9.6
10.8
15
9
9.6
10.8
20
17
15.4
10.8
20
12
15.4
10.8
20
11
15.4
10.8
20
18
15.4
10.8
20
19
15.4
10.8
25
14
17.6
10.8
25
18
17.6
10.8
25
18
17.6
10.8
25
19
17.6
10.8
25
19
17.6
10.8
30
20
21.6
10.8
30
24
21.6
10.8
30
22
21.6
10.8
30
19
21.6
10.8
30
23
21.6
10.8
35
7
10.8
10.8
35
10
10.8
10.8
35
11
10.8
10.8
35
15
10.8
10.8
35
11
10.8
10.8
SSQ
6275
2916
Effect
-1.2
-1.2
-1.2
-1.2
-1.2
4.6
4.6
4.6
4.6
4.6
6.8
6.8
6.8
6.8
6.8
10.8
10.8
10.8
10.8
10.8
0
0
0
0
0
927.4
Residual
-2.6
-2.6
0.4
5.4
-0.6
1.6
-3.4
-4.4
2.6
3.6
-3.6
0.4
0.4
1.4
1.4
-1.6
2.4
0.4
-2.6
1.4
-3.8
-0.8
0.2
4.2
0.2
163.6
Decomposing Sums of Squares
ni
t
t
ni

2
y
 ij  mt  i  ij
i 1 j 1
i 1 j 1


ni

2
ij
ij
 0 for all i
j1
mt  i  ij

2
 mt2  2mti  i2  2m t ij  2i ij  ij2
 mt2  2mti  i2  ij2
t
ni
t
ni

2
2
2
2
y

m




 ij  t i ij  2mti
i 1 j 1
i 1 j 1
t
t
i 1
i 1
t
ni

t
 m t2  ni   nii2   ij2  2m t  nii
i 1 j 1
6275 = 2916.0 + 927.4 + 163.4
i 1
+ 2278
Reference Cell Model
Compression Resistance
30
̂ 4
Resistance (10,000 psi)
25
20
mˆ t
15
10
5
0
10
15
20
25
Percent Sand
30
35
40
SAS Program
options ls=78 ps=49 nodate;
data stress;
input sand resistance @@;
datalines;
15 7 15 7 15 10 15 15 15 9
20 17 20 12 20 11 20 18 20 19
25 14 25 18 25 18 25 19 25 19
30 20 30 24 30 22 30 19 30 23
35 7 35 10 35 11 35 15 35 11
;
proc glm data=stress;
class sand;
model resistance = sand / solution;
title2 'Compression resistance in concrete beams
as';
title2 ' a function of percent sand in the mix';
run;
SAS Output(1)
Compression resistance in concrete beams as
a function of percent sand in the mix
The GLM Procedure
Dependent Variable: resistance
Sum of
Source
DF
Squares
Mean Square
F Value
Pr > F
Model
4
486.4000000
121.6000000
14.87
<.0001
Error
20
163.6000000
8.1800000
Corrected Total
24
650.0000000
R-Square
Coeff Var
Root MSE
resistance Mean
0.748308
19.06713
2.860070
15.00000
SAS Output(2)
Source
sand
Source
sand
DF
Type I SS
Mean Square
F Value
Pr > F
4
486.4000000
121.6000000
14.87
<.0001
DF
Type III SS
Mean Square
F Value
Pr > F
4
486.4000000
121.6000000
14.87
<.0001
Standard
Parameter
Estimate
Error
t Value
Pr > |t|
Intercept
10.80000000 B
1.27906216
8.44
<.0001
sand
15
-1.20000000 B
1.80886705
-0.66
0.5146
sand
20
4.60000000 B
1.80886705
2.54
0.0194
sand
25
6.80000000 B
1.80886705
3.76
0.0012
sand
30
10.80000000 B
1.80886705
5.97
<.0001
sand
35
0.00000000 B
.
.
.
NOTE: The X'X matrix has been found to be singular, and a generalized inverse
was used to solve the normal equations.
Terms whose estimates are
followed by the letter 'B' are not uniquely estimable.
Minitab
One-way ANOVA: Resist versus Sand
Analysis of Variance for Resist
Source
DF
SS
MS
F
P
Sand
4
486.40
121.60
14.87
0.000
Error
20
163.60
8.18
Total
24
650.00
Individual 95% CIs For Mean
Based on Pooled StDev
Level
N
Mean
StDev
15
5
9.600
3.286
20
5
15.400
3.647
25
5
17.600
2.074
30
5
21.600
2.074
35
5
10.800
2.864
-------+---------+---------+--------(----*-----)
(-----*----)
(----*-----)
(----*-----)
(-----*----)
-------+---------+---------+---------
Pooled StDev =
2.860
10.0
15.0
20.0
Minitab
Stat  ANOVA  One-Way
Multiple comparisons (later)
Minitab Dot Plot
SPSS AOV Table
ANOVA
RESIST
Sum of
Squares
df
Mean Square
Between Groups
486.400
4
121.600
Within Groups
163.600
20
8.180
Total
650.000
24
F
14.866
Sig.
.000
SPSS Descriptives
Descriptives
RESIST
N
15.00
20.00
25.00
30.00
35.00
Total
Model
5
5
5
5
5
25
Fixed Effects
Random Effects
Mean
9.6000
15.4000
17.6000
21.6000
10.8000
15.0000
Std.
Deviation
3.28634
3.64692
2.07364
2.07364
2.86356
5.20416
2.86007
Std. Error
1.46969
1.63095
.92736
.92736
1.28062
1.04083
.57201
2.20545
Minimum
7.00
11.00
14.00
19.00
7.00
7.00
95% Confidence Interval for
Mean
Lower Bound
Upper Bound
5.5195
13.6805
10.8718
19.9282
15.0252
20.1748
19.0252
24.1748
7.2444
14.3556
12.8518
17.1482
13.8068
16.1932
8.8767
21.1233
Maximum
15.00
19.00
19.00
24.00
15.00
24.00
BetweenComponent
Variance
22.68400
CRD Analysis in R
> resist <- c(7,7,10,15,9,17,12,11,18,19,14, …,19,23,7,10,11,15,11)
> sand <- factor(rep(seq(15,35,5),rep(5,5)))
> myfit <- aov(resist~sand)
> summary(myfit)
Df Sum Sq
Mean Sq
F value
sand
4 486.40
121.60
14.866
Residuals
20 163.60
8.18
--Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
> coef(myfit)
(Intercept)
sand20
9.6
5.8
sand25
8.0
sand30
12.0
Pr(>F)
8.655e-06 ***
sand35
1.2
R functions aov() & lm() by default reference first cell mean!
Fixed Effects
Normally, the “effect” of a particular treatment is assumed to be a
constant value (i) added to the response of all units in the group
receiving the treatment.
If the treatments are well defined, easily replicable and are expected to
produce the same effect on average in each replicate, we have a fixed
set of treatments and the AOV model is said to describe a fixed effects
model.
Examples:
• A scientist develops 3 new fungicides. Her interest is in these fungicides only.
• The impact of 4 specific soil types on plant growth are of interest.
• Three particular milling machines are being compared.
• Four particular lakes are of interest in their weed biomass densities.
• Three tests for assessing developmental learning are being compared.
Random Effects
If the treatments cannot be assumed to be from a prespecified or known
set of treatments, they are assumed to be a random sample from some
larger population of potential treatments. In this case, the AOV model is
called a random effects model and the i are called random effects.
Examples:
•
A scientist is interested in how fungicides work. Ten (10) fungicides are selected (at
random) to represent the population of all fungicides in the research (plots as replicates).
•
Four soil sub groups are selected for examining plant growth (pots as replicates).
•
Three milling machines selected at random from the production line are compared (runs
as replicates).
•
16 lakes selected at random are measured for their weed biomass densities (water
samples as replicates).
•
A standard test for development is given to 20 middle school classes selected at random
from the over 200 available among all middle schools in the county (student as replicate).
In each case, we assume the values for the effects would change if our
sample had changed. Inference is directed not to answering “which
treatment is different from which other treatment?” but to the issue of “is
the variability among treatments significantly greater than the residual
variability?”.
Closing Comments on CRD
Even though we have introduced several variations on the same
basic model for defining “effects”, the final F-test for the hypothesis
of overall equal group means is the same one developed as part of
the analysis of variance. It turns out that there may be computational
advantages to using the one formulation of the model over another,
but this has absolutely no effect on the hypothesis test. We will see
this in the next Section.
H0 : 1  2    t 1  0
H0 : 1   2     t  0
Ha : At least one of the  differ from 0 Ha : At least one of the  differ from 0
For simple one-factor designs, whether the treatment effect is considered
random or fixed, the F-test is the same, the interpretation is different.
Download