ST 524 NCSU - Fall 2008 Homework 1 Due: September 11, 2008 1. Description A plant scientist measured the concentration of a particular virus in plant sap using ELISA (enzyme-linked immunosorbent assay) (Novy 1992). The study included 13 potato clones:: 2 commercial cultivars, 5 somatic hybrids, 5 progeny of the somatic hybrids, and one clone of Solanum etuberosum (a species related to potato). Of the 5 progeny of the somatic hybrids, two were classified as susceptible and three as resistant to the virus. The scientist wants to understand the resistance to the virus among these 13 clones. Plant sap was taken from 5 inoculated plants of each clone, for a total of 65 measurements of titer. One measurement was lost during processing of the samples. Reference : Yandell (2001) Data clone reps titer code type 1 1 1302 a susc 1 2 1717 a susc 1 3 1321 a susc 1 4 1358 a susc 1 5 1093 a susc 2 1 32 b etb 2 2 12 b etb 2 3 25 b etb 2 4 61 b etb 2 5 93 b etb 3 2 1846 c cult 3 3 1745 c cult 3 4 1814 c cult 3 5 1752 c cult 4 1 197 d res 4 2 380 d res 4 3 280 d res 4 4 112 d res 4 5 355 d res 5 1 529 e par 5 2 396 e par 5 3 629 e par 5 4 261 e par 5 5 325 e par 6 1 931 f par 6 2 791 f par 6 3 57 f par 6 4 706 f par 6 5 742 f par 7 1 1361 g cult 7 2 363 g cult 7 3 418 g cult 7 4 579 g cult 7 5 1660 g cult clone reps titer code type 8 8 8 8 8 9 9 9 9 9 10 10 10 10 10 11 11 11 11 11 12 12 12 12 12 13 13 13 13 13 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 361 113 338 283 301 594 173 526 58 88 644 680 663 780 965 549 603 229 398 252 1185 1105 1196 949 906 214 351 564 98 417 h h h h h i i i i i j j j j j k k k k k l l l l l m m m m m res res res res res par par par par par odd odd odd odd odd par par par par par susc susc susc susc susc res res res res res Objective: Checking assumptions. Decide whether a common residual variance is a good fit in model. 1) Write statistical linear model. yij i eij , eij iidN 0, 2 2) Run an analysis of variance in Proc GLM. Analysis of Variance Table. Friday September 5, 2008 1 ST 524 NCSU - Fall 2008 3) Graph standardized residuals (StudentResid) vs predicted (mean value) for each clone. Summarize findings. Clones 6 and 7 show larger spread, while clone 2, 3 and 8 shows smaller spread. The rest of clones show homogeneity variance. There is one observation with Studentized residual greater that 3.5, that belongs to clone 7. Friday September 5, 2008 2 ST 524 NCSU - Fall 2008 4) Run homogeneity of variance test. Use Brown-Forsythe’s and Bartlett tests. Indicate hypothesis and conclusions. Use = 0.05. H o : 12 22 132 2 H1 : at least one i2 is different At 0.05 level of significance we do not reject Ho, and we can assume homogeneity of variances. Bartlett test At 0.05 level of significance we reject Ho, and we conclude that variances are not homogeneous. Bartlett assumes normality distribution; it is affected by slight non normality. Friday September 5, 2008 3 ST 524 NCSU - Fall 2008 QQplot of residuals shows that distribution of residuals deviates moderately from normal distribution., which may explain the highly significance of Bartlett test, although Brown-Forsythe allows us to assume Heterogeneity of variances. 5) Study the need to fit separate variances for each type level. There are 6 types: cultivar, etb, odd, par, resistant, susceptible. The type = cultivar includes clone 7 that has the largest variance and clone 2 with very small variance; while clones 5, 6, 9, 11 are in type = parent, with moderate variability. Resistant clones 4, 8, and 12 show smaller variability than type=parent. Susceptible clones 1 and 12 present small variability .Type= etb, with very low variability, type= odd with small variability are the three remaining types with only one clone each. Friday September 5, 2008 4 ST 524 NCSU - Fall 2008 Plot of response for each observation against clone mean value does not show a trend of variability increasing with mean value. There is heterogeneity present, mostly due to clone 2, 3, and 7, and the question is whether a separate variance may improve the fitting. Graph of mean vs sd (or var) does not show any trend, except that variance for type = cultivar is higher than the rest and type=stb have very low variance and mean response. CV tends to be in the range 30-70, with lower CV associated to higher mean response for susceptible clones 1 and 12, and cultivar 2. 6) Select the best model to be fitted. a. Test whether a model with a common residual variance is preferred to a model with separate variances. Use a likelihood ratio test, and = 0.05. - Fit model with common residual variance. Fit Statistics Friday September 5, 2008 5 ST 524 NCSU - Fall 2008 - Fit model with separate residual variance for each type. : Var e H o : Var ei k j e2 where e2 is the common residual variance for all groups Ho i k j 2 typek e 2 where type is the residual variance for k th group, k 1, ke ,6 2 calc 2 Re sLogL H 2 Re sLogL H o 1 724.9 694.6 29.4 Under Ho, P 52df 29.4 0.0001 DF = 6 – 1 = 5= Number of variance components under H1 - Number of variance components under Ho Conclusion: Reject Ho , there is statistical evidence that H1 : ei k j 2 iidN 0, type _ ke , k 1, ,6 This results lead us to favor a model with separate variances for each type group. b. Check residual plot and normality test for standardized residuals in selected model. Friday September 5, 2008 6 ST 524 NCSU - Fall 2008 Normality test, Null hypthesis Ho: eij iidN 0, 2 ie Residual random effects are normally distributed with mean and variance estimated by 2 th with mean 0 and variance type type, k =1, _ ke , residual variance for k ,6 All Normality test indicate that studentized residuals follow a Normal distribution with mean 0 and variance 1.006, which is close to the theoretical value of 1 for the studentized residual. c. Write down the final model. Test of hypothesis. Conclusions. Use = 0.05. yi k j i k ei k j , where ei k j 2 iidN 0, type _ ke , k 1, ,6 Test of hypothesis for fixed effects H o : 1 2 13 H1 : at least one i is different, i =1, ,13 7) Explain any limitations to the final selected model. Final model selected based on the likelihood ratio test may not be the most adequate since type of cultivar does not provide a good classification of clone variability. Type=cultivar includes two cultivar with extreme variability, while type= parent also includes one clone (6) with larger variance than others in the group. Obs clone 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13 n titer n Mean_ var SD type 5 5 4 5 5 5 5 5 5 5 5 5 5 1358.20 44.60 1789.25 264.80 428.00 645.40 876.20 279.20 287.80 746.40 406.20 1068.20 328.80 5 5 4 5 5 5 5 5 5 5 5 5 5 50902.70 1054.30 2392.92 12395.70 22531.00 115496.30 352755.70 9565.20 64101.20 17691.30 28591.70 17961.70 32509.70 225.616 32.470 48.917 111.336 150.103 339.847 593.932 97.802 253.182 133.009 169.091 134.021 180.304 susc etb cult res par par cult res par odd par susc res It is important to indicate that the likelihood ratio test assumes multivariate normality and it is asymptotically robust to non normality, i.e., when sample size is large. If we look at the normality test for residuals calculated separately for each clone (residual = observed value – clone mean) Goodness-of-Fit Tests for Normal Distribution Test Friday September 5, 2008 ----Statistic----- DF ------p Value------ 7 ST 524 NCSU - Fall 2008 Kolmogorov-Smirnov Cramer-von Mises Anderson-Darling Chi-Square D W-Sq A-Sq Chi-Sq 0.1078156 0.1337584 0.7980371 11.7996671 Pr Pr Pr Pr 4 > > > > D W-Sq A-Sq Chi-Sq 0.065 0.040 0.038 0.019 A p-value= 0.038 for Anderson-Darling test of normality for residuals, may indicate that results from likelihood ratio test should be taken with reserve, since probability of a Type I Error may be higher than the nominal 0.05. It is necessary to know why this clone 7 shows such a high variability. The number of plants per experimental unit seem to be 1, smaller variance may be attain with a number of plants per exp. unit and using their average as the response. 2. A study will be carried out in the greenhouse on the effects of 2 methods of obtaining cuttings (M1, M2) on the growth of 5 cultivars of an ornamental shrub (V1, V2, V3, V4, V5). Identify uniquely each method by cultivar combination from T1 through T10. Use PROC PLAN to obtain: 1) a randomization plan for an CRD with 4 pots per cutting method and cultivar combination, with three cuttings per plot. These pots for 10 methods by cultivar treatment combinations will be randomly distributed along four selected benches in the greenhouse. Sketch a plan for the layout in the greenhouse indicating the position of each treatment combination. Assume each bench will contain a single row of 10 pots. Cutting M1 Cultivar V1 Treatment T1 M1 V2 T2 M1 V3 T3 M1 V4 T4 M1 V5 T5 M2 V1 T6 M2 V2 T7 M2 V3 T8 M2 V4 T9 M2 V5 T10 Number of repetitions per treatment is 4 Total number of repetitions is 40: 1,2,3,4,5,6,7,8, …,38,39,40 The PLAN Procedure Plot Factors Factor Select Levels Order 40 40 Random unit Treatment Factors Factor Select Levels Order 40 40 Cyclic treat Initial Block / Increment (1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7 7 8 8 8 8 9 9 9 9 10 10 10 10) / 1 ----------------------------------------------------------unit--------------------------------------------------------20 6 18 22 29 38 19 31 7 34 30 25 2 14 5 39 12 10 24 27 32 4 16 26 37 1 28 3 40 35 36 9 11 15 21 17 23 13 8 33 ---------------------------------------------------------treat--------------------------------------------------------1 1 1 1 2 2 2 2 3 3 3 3 4 1 T7 8 T10 9 2 T4 7 T2 10 3 T7 6 T1 4 T6 5 T4 4 4 5 5 5 5 6 6 6 6 7 7 7 7 8 8 8 8 9 9 9 9 10 10 10 10 16 T6 17 T9 24 T5 25 T3 32 T6 33 T10 40 T8 T5 15 T9 18 T1 23 T10 26 T6 31 T2 34 T3 39 T4 11 T9 14 T4 19 T2 22 T1 27 T5 30 T3 35 T8 38 T2 12 T5 13 T10 20 T1 21 T9 T7 29 T2 36 T8 37 T7 Friday September 5, 2008 T8 4 28 8 ST 524 NCSU - Fall 2008 2) a randomization plan for an RCBD with 4 blocks corresponding to 4 benches in the greenhouse, and all 10 methods by cultivar treatment combinations represented once in each block. Sketch a plan for the layout in the greenhouse indicating the position of each treatment combination in each block. Assume each block will contain a single row of 'plots'. Plot Factors Factor Select Levels blocks plot 4 10 4 10 Order Ordered Ordered Treatment Factors Factor Select Levels Order 10 10 Random t blocks -------------plot------------ 1 2 3 4 1 1 1 1 1 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 T6 2 T2 3 T1 4 T5 5 T4 T1 6 T7 T6 5 6 T10 9 T8 8 T4 7 1 T4 2 T7 3 T2 4 10 T2 9 T8 8 T1 10 2 2 2 2 Friday September 5, 2008 7 T5 7 7 7 7 8 8 8 8 6 5 9 9 9 9 T8 T2 T3 6 T8 T4 5 T9 --------------t-------------- 10 10 10 10 6 2 3 5 4 7 6 10 1 9 2 7 5 6 6 3 4 2 3 9 8 7 8 4 9 1 1 5 7 10 3 4 8 10 5 9 10 1 8 2 7 T9 8 T7 9 T10 10 T3 4 T6 3 T9 2 T5 1 T3 7 T1 8 T5 9 T9 10 T10 3 T7 T10 1 T6 4 T3 2 9