Homework 1SEPTEMBER05_Solution.doc

advertisement
ST 524
NCSU - Fall 2008
Homework 1
Due: September 11, 2008
1. Description
A plant scientist measured the concentration of a particular virus in plant sap using ELISA
(enzyme-linked immunosorbent assay) (Novy 1992). The study included 13 potato clones:: 2
commercial cultivars, 5 somatic hybrids, 5 progeny of the somatic hybrids, and one clone of
Solanum etuberosum (a species related to potato). Of the 5 progeny of the somatic hybrids, two
were classified as susceptible and three as resistant to the virus. The scientist wants to understand
the resistance to the virus among these 13 clones. Plant sap was taken from 5 inoculated plants
of each clone, for a total of 65 measurements of titer. One measurement was lost during
processing of the samples.
Reference : Yandell (2001)
Data
clone reps titer code type
1
1 1302
a susc
1
2 1717
a susc
1
3 1321
a susc
1
4 1358
a susc
1
5 1093
a susc
2
1
32
b etb
2
2
12
b etb
2
3
25
b etb
2
4
61
b etb
2
5
93
b etb
3
2 1846
c cult
3
3 1745
c cult
3
4 1814
c cult
3
5 1752
c cult
4
1
197
d res
4
2
380
d res
4
3
280
d res
4
4
112
d res
4
5
355
d res
5
1
529
e par
5
2
396
e par
5
3
629
e par
5
4
261
e par
5
5
325
e par
6
1
931
f par
6
2
791
f par
6
3
57
f par
6
4
706
f par
6
5
742
f par
7
1 1361
g cult
7
2
363
g cult
7
3
418
g cult
7
4
579
g cult
7
5 1660
g cult
clone reps titer code type
8
8
8
8
8
9
9
9
9
9
10
10
10
10
10
11
11
11
11
11
12
12
12
12
12
13
13
13
13
13
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
361
113
338
283
301
594
173
526
58
88
644
680
663
780
965
549
603
229
398
252
1185
1105
1196
949
906
214
351
564
98
417
h
h
h
h
h
i
i
i
i
i
j
j
j
j
j
k
k
k
k
k
l
l
l
l
l
m
m
m
m
m
res
res
res
res
res
par
par
par
par
par
odd
odd
odd
odd
odd
par
par
par
par
par
susc
susc
susc
susc
susc
res
res
res
res
res
Objective: Checking assumptions. Decide whether a common residual variance is a good fit in model.
1) Write statistical linear model.
yij     i  eij , eij
iidN  0,  2 
2) Run an analysis of variance in Proc GLM. Analysis of Variance Table.
Friday September 5, 2008
1
ST 524
NCSU - Fall 2008
3) Graph standardized residuals (StudentResid) vs predicted (mean value) for each clone.
Summarize findings.
Clones 6 and 7 show larger spread, while clone 2, 3 and 8 shows smaller spread. The rest of clones show
homogeneity variance. There is one observation with Studentized residual greater that 3.5, that
belongs to clone 7.
Friday September 5, 2008
2
ST 524
NCSU - Fall 2008
4) Run homogeneity of variance test. Use Brown-Forsythe’s and Bartlett tests. Indicate
hypothesis and conclusions. Use = 0.05.
H o : 12   22 
 132   2
H1 : at least one  i2 is different
At
  0.05 level of significance we do not reject Ho, and we can assume homogeneity of variances.
Bartlett test
At   0.05 level of significance we reject Ho, and we conclude that variances are not homogeneous.
Bartlett assumes normality distribution; it is affected by slight non normality.
Friday September 5, 2008
3
ST 524
NCSU - Fall 2008
QQplot of residuals shows that distribution of residuals deviates moderately from normal distribution.,
which may explain the highly significance of Bartlett test, although Brown-Forsythe allows us to
assume Heterogeneity of variances.
5) Study the need to fit separate variances for each type level.
There are 6 types: cultivar, etb, odd, par, resistant, susceptible. The type = cultivar includes clone 7 that
has the largest variance and clone 2 with very small variance; while clones 5, 6, 9, 11 are in type = parent,
with moderate variability. Resistant clones 4, 8, and 12 show smaller variability than type=parent.
Susceptible clones 1 and 12 present small variability .Type= etb, with very low variability, type= odd
with small variability are the three remaining types with only one clone each.
Friday September 5, 2008
4
ST 524
NCSU - Fall 2008
Plot of response for each observation against clone mean value does not show a trend of variability
increasing with mean value. There is heterogeneity present, mostly due to clone 2, 3, and 7, and the
question is whether a separate variance may improve the fitting.
Graph of mean vs sd (or var) does not show any trend, except that variance for type = cultivar is
higher than the rest and type=stb have very low variance and mean response. CV tends to be in
the range 30-70, with lower CV associated to higher mean response for susceptible clones 1 and 12,
and cultivar 2.
6) Select the best model to be fitted.
a. Test whether a model with a common residual variance is preferred to a model with separate
variances. Use a likelihood ratio test, and  = 0.05.
- Fit model with common residual variance. Fit Statistics
Friday September 5, 2008
5
ST 524
NCSU - Fall 2008
-
Fit model with separate residual variance for each type.
 
: Var  e     
H o : Var ei k  j   e2 where  e2 is the common residual variance for all groups
Ho
i k j
2
typek e
2
where  type
is the residual variance for k th group, k  1,
ke
,6
2
 calc
  2 Re sLogL  H   2 Re sLogL  H
o
1
 724.9  694.6
 29.4
Under Ho, P  52df  29.4  0.0001


DF = 6 – 1 = 5= Number of variance components under H1 - Number of variance components under Ho
Conclusion: Reject Ho , there is statistical evidence that H1 : ei k  j
2
iidN  0,  type
_ ke  , k  1,
,6
This results lead us to favor a model with separate variances for each type group.
b. Check residual plot and normality test for standardized residuals in selected model.
Friday September 5, 2008
6
ST 524
NCSU - Fall 2008
Normality test,
Null hypthesis Ho:
eij
iidN  0, 
2
ie

Residual random effects are normally distributed with mean and variance estimated by
2
th
with mean 0 and variance  type
type, k =1,
_ ke , residual variance for k
,6
All Normality test indicate that studentized residuals follow a Normal distribution with mean 0 and variance
1.006, which is close to the theoretical value of 1 for the studentized residual.
c. Write down the final model. Test of hypothesis. Conclusions. Use = 0.05.
yi k  j     i k   ei k  j , where ei k  j
2
iidN  0,  type
_ ke  , k  1,
,6
Test of hypothesis for fixed effects
H o : 1  2 
 13  
H1 : at least one i is different, i =1,
,13
7) Explain any limitations to the final selected model.
Final model selected based on the likelihood ratio test may not be the most adequate since type of
cultivar does not provide a good classification of clone variability. Type=cultivar includes two cultivar
with extreme variability, while type= parent also includes one clone (6) with larger variance than others
in the group.
Obs
clone
1
2
3
4
5
6
7
8
9
10
11
12
13
1
2
3
4
5
6
7
8
9
10
11
12
13
n
titer
n
Mean_
var
SD
type
5
5
4
5
5
5
5
5
5
5
5
5
5
1358.20
44.60
1789.25
264.80
428.00
645.40
876.20
279.20
287.80
746.40
406.20
1068.20
328.80
5
5
4
5
5
5
5
5
5
5
5
5
5
50902.70
1054.30
2392.92
12395.70
22531.00
115496.30
352755.70
9565.20
64101.20
17691.30
28591.70
17961.70
32509.70
225.616
32.470
48.917
111.336
150.103
339.847
593.932
97.802
253.182
133.009
169.091
134.021
180.304
susc
etb
cult
res
par
par
cult
res
par
odd
par
susc
res
It is important to indicate that the likelihood ratio test assumes multivariate normality and
it is asymptotically robust to non normality, i.e., when sample size is large. If we look at the
normality test for residuals calculated separately for each clone (residual = observed value –
clone mean)
Goodness-of-Fit Tests for Normal Distribution
Test
Friday September 5, 2008
----Statistic-----
DF
------p Value------
7
ST 524
NCSU - Fall 2008
Kolmogorov-Smirnov
Cramer-von Mises
Anderson-Darling
Chi-Square
D
W-Sq
A-Sq
Chi-Sq
0.1078156
0.1337584
0.7980371
11.7996671
Pr
Pr
Pr
Pr
4
>
>
>
>
D
W-Sq
A-Sq
Chi-Sq
0.065
0.040
0.038
0.019
A p-value= 0.038 for Anderson-Darling test of normality for residuals, may indicate that results from
likelihood ratio test should be taken with reserve, since probability of a Type I Error may be higher
than the nominal 0.05.
It is necessary to know why this clone 7 shows such a high variability. The number of plants per
experimental unit seem to be 1, smaller variance may be attain with a number of plants per exp. unit
and using their average as the response.
2. A study will be carried out in the greenhouse on the effects of 2 methods of obtaining cuttings (M1,
M2) on the growth of 5 cultivars of an ornamental shrub (V1, V2, V3, V4, V5). Identify uniquely each
method by cultivar combination from T1 through T10. Use PROC PLAN to obtain:
1) a randomization plan for an CRD with 4 pots per cutting method and cultivar combination,
with three cuttings per plot. These pots for 10 methods by cultivar treatment combinations
will be randomly distributed along four selected benches in the greenhouse. Sketch a plan for
the layout in the greenhouse indicating the position of each treatment combination. Assume
each bench will contain a single row of 10 pots.
Cutting
M1
Cultivar
V1
Treatment T1
M1
V2
T2
M1
V3
T3
M1
V4
T4
M1
V5
T5
M2
V1
T6
M2
V2
T7
M2
V3
T8
M2
V4
T9
M2
V5
T10
Number of repetitions per treatment is 4
Total number of repetitions is 40: 1,2,3,4,5,6,7,8, …,38,39,40
The PLAN Procedure
Plot Factors
Factor
Select
Levels
Order
40
40
Random
unit
Treatment Factors
Factor
Select
Levels
Order
40
40
Cyclic
treat
Initial Block / Increment
(1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7 7 8 8 8 8 9 9 9 9 10 10 10 10) / 1
----------------------------------------------------------unit--------------------------------------------------------20 6 18 22 29 38 19 31
7 34 30 25
2 14 5 39 12 10 24 27 32
4 16 26 37
1 28
3 40 35 36
9 11 15 21 17 23 13
8 33
---------------------------------------------------------treat--------------------------------------------------------1
1
1
1
2
2
2
2
3
3
3
3
4
1
T7
8
T10
9
2
T4
7
T2
10
3
T7
6
T1
4
T6
5
T4
4
4
5
5
5
5
6
6
6
6
7
7
7
7
8
8
8
8
9
9
9
9 10 10 10 10
16
T6
17
T9
24
T5
25
T3
32
T6
33
T10
40
T8
T5
15
T9
18
T1
23
T10
26
T6
31
T2
34
T3
39
T4
11
T9
14
T4
19
T2
22
T1
27
T5
30
T3
35
T8
38
T2
12
T5
13
T10
20
T1
21
T9
T7
29
T2
36
T8
37
T7
Friday September 5, 2008
T8
4
28
8
ST 524
NCSU - Fall 2008
2) a randomization plan for an RCBD with 4 blocks corresponding to 4 benches in the
greenhouse, and all 10 methods by cultivar treatment combinations represented once
in each block. Sketch a plan for the layout in the greenhouse indicating the position of
each treatment combination in each block. Assume each block will contain a single
row of 'plots'.
Plot Factors
Factor
Select
Levels
blocks
plot
4
10
4
10
Order
Ordered
Ordered
Treatment Factors
Factor
Select
Levels
Order
10
10
Random
t
blocks
-------------plot------------
1
2
3
4
1
1
1
1
1
3
3
3
3
4
4
4
4
5
5
5
5
6
6
6
6
T6
2
T2
3
T1
4
T5
5
T4
T1
6
T7
T6
5
6
T10
9
T8
8
T4
7
1
T4
2
T7
3
T2
4
10
T2
9
T8
8
T1
10
2
2
2
2
Friday September 5, 2008
7
T5
7
7
7
7
8
8
8
8
6
5
9
9
9
9
T8
T2
T3
6
T8
T4
5
T9
--------------t--------------
10
10
10
10
6 2
3 5
4 7
6 10
1
9
2
7
5
6
6
3
4
2
3
9
8
7
8
4
9
1
1
5
7 10 3
4 8 10
5 9 10
1 8 2
7
T9
8
T7
9
T10
10
T3
4
T6
3
T9
2
T5
1
T3
7
T1
8
T5
9
T9
10
T10
3
T7
T10
1
T6
4
T3
2
9
Download