ST 524 NCSU - Fall 2008 Due: 12/08/08 Final TAKE HOME EXAM – FALL 08 Analysis - Disease Progress Curve : Controlling papaya ring spot virus Data kindly provided by Pedro Torres, graduate student in Statistics and TA for our course. Data was used for a study that is summarized below. Title: Nonlinear models for analyzing disease progress Authors: Raúl Macchiavelli, Wilfredo Robles, Edwin Abreu and Alberto Pantoja. College of Agricultural Sciences,Univ. of Puerto Rico – Mayagüez Monitoring plant diseases Diseases are normally monitored over time, assessing the amount of disease present in a population of plants: “Disease Progress Curve” Represents an interpretation of all host, pathogen and environmental effects occurring during an epidemic (Campbell and Madden, 1990) Disease Progress Curve - Models Y amount of disease Proportion of diseased trees (out of 20) t time dY/dt absolute rate of disease increase (or decrease) Quantitative description of epidemics: dY/dt vs. Y, dY/dt vs. t Logistic Disease Progress Curve Gompertz Disease Progress Curve dY rl Y (1 Y ) dt 1 Y 1 exp( B rl t ) Y log 1 Y dY rg Y log Y dt Y exp B exp(rg t ) log log Y log log Y0 rg t Y0 rl t log 1 Y0 Treatment is a Factor: categorical variable. Need to create 0/1 variables to identify each treatment level. Day is a quantitative explanatory variable. Effect of each Treatment level: Differences in intercept of curve Day effect: slope of curve is not zero Treatment*Day: changes in slope for some treatments. Controlling papaya ring spot virus Twenty plots were planted, each with 20 papaya plants Controlling papaya ring spot virus Twenty plots were planted, each with 20 papaya plants There were 4 different treatments for the control of certain insects (aphids) which are vectors of the virus o Control (no weeds) T o Plastic (black) PC o Plastic (silver) PP o Weeds M Each treatment was randomly assigned to 5 plots (CRD) The experiment was monitored weekly (8 weeks) Each week, every plant was checked to see whether it showed symptoms Once the plant showed symptoms, it was classified as diseased for the rest of the experiment Analyzing the Data The variable of interest is the disease index for treatment i, time j and plot k: number of plants with symptoms Yijk 20 Effect of Plant Species and Environmental Conditions on Epiphytic Population Sizes of Pseudomonas syringae and other Bacteria. R. D. O’Brien and S. E. Lindow. 1989. the American Phytopathological Society. V. 79, No. 5, 1989, 1 ST 524 NCSU - Fall 2008 Due: 12/08/08 Problems with traditional approach o Non normal distribution o Non linear models o Non constant Variances o Observations of the same tree in different weeks are dependent o Observations on the same plot may not be independent (contagion) Traditional analysis o Fit separate curves for each plot using linear regression o Compare slopes for different treatments Generalized linear models o The linear component is defined like in traditional linear models: i xi ' o A monotonic differentiable link function g describes how the expected value of y , E Y , related to the linear predictor: g ( i ) xi ' o The response variables y are independent and have a probability distribution from an exponential family. This implies that the variance of the response depends on the mean through a variance function V: var(Yi ) V ( i ) o The dispersion parameter is either assumed known (for example, for the binomial distribution, = 1) or it must be estimated to account for overdispersion. a) Nonlinear fitting of observed proportion of diseased plants (out of 20 plants per plot) using PROC NLIN in SAS Need to set up dummy variables for treatments Needs initial values of parameters Logistic Fit Yijk number trees diseased 20 E Yijk o i 1 day j 90 2i day j 90 * i log 1 E Yijk ij Yijk exp ij 1 exp ij eijk ij o i 1 day j 90 2i day j 90 * i oi 1i day ij 90 Normal 0, 2 eijk Gompertz Fit Yijk number trees diseased 20 log log E Yijk ij o i 1 day j 90 2i day j 90 * i Yijk exp exp ij eijk ij o i 1 day j 90 2i day j 90 * i oi 1i day ij 90 eijk Normal 0, 2 Effect of Plant Species and Environmental Conditions on Epiphytic Population Sizes of Pseudomonas syringae and other Bacteria. R. D. O’Brien and S. E. Lindow. 1989. the American Phytopathological Society. V. 79, No. 5, 1989, 2 ST 524 NCSU - Fall 2008 Due: 12/08/08 b) Estimation of parameters and treatment effect of Generalized Linear Model with PROC GENMOD in SAS and o o o o No need for initial estimates Use of CLASS statement sets up dummy variables (treatment effect) directly Use of CONTRAST statement allows comparing treatment effect, equality of slopes, etc. Use of ESTIMATE statement allows predictions, confidence intervals . Yijk number of diseased trees out of 20 per plot binomial 20, ij Yijk log ij 1 ij i 1day j 2i day j i E Yijk ij Var Yijk 20 ij 1 ij i = 1, 2, 3, 4 treatments j=1, 2, 3, 4, 5, 6, 7, 8 timepoints k = 1, 2, 3, 4, 5 blocks c) Note Non linear models with normal residuals (NLIN) do not take into account actual distribution or longitudinal nature. Because of contagion, Number of diseased trees (out of 20) is not a binomial random variable, variance may not correspond to a binomial random variable. E Yijk ij Var Yijk 20 ij 1 ij overdispersion parameter Non linear models fitting a binomial distribution with possibly overdispersion do not take into account longitudinal nature. Overdispersion parameter may be estimated as the square root of deviance divided by its degrees of freedom. If the ratio (deviance/d.f) is greater than 1 indicates that overdispersion is present. Use scale= deviance option in PROC GENMOD, MODEL statement, to fit a binomial with overdispersion. Standard errors and tests are adjusted to account for extra variation d) Estimation of the parameters of Generalized Linear Model with PROC NLINMIXED in SAS Repeated observations from the same plot are correlated, same random plot effect. Yijk number of diseased trees out of 20 per plot Yijk | uik binomial 20, ij ij log 1 ij uik E Yijk ij i 1day j 2i day j i uik Normal 0, 2 i = 1, 2, 3, 4 treatments j = 1, 2, 3, 4, 5, 6, 7, 8 timepoints k = 1, 2, 3, 4 blocks Var Yijk 20 ij 1 ij ij exp int i 1day j 2i day j i 1 exp int i 1day j 2i day j i No accounting for correlation between measurements within same tree. Effect of Plant Species and Environmental Conditions on Epiphytic Population Sizes of Pseudomonas syringae and other Bacteria. R. D. O’Brien and S. E. Lindow. 1989. the American Phytopathological Society. V. 79, No. 5, 1989, 3 ST 524 NCSU - Fall 2008 Due: 12/08/08 Questions PROC NLIN is used to fit two disease curves. Q1.a. Write down both estimated equations. Q1.b. Calculate R2, measure of goodness of fit, Q1.c. which model, GOMPERTZ or Logistic, shows better fit? PROC GENMOD is used for a fitting the number of diseased trees within each plot as a binomial random variable with n=20 (trees) and the probability for a tree being diseased as a function of Treatment and Day. Full model fits four slopes (for linear time effect), one for each treatment and four separate intercepts (treatment effects). Q1.d. Would you recommend to adjust for overdispersion?. Contrasts test ☼ whether slopes for plastic covers have same effects ☼ whether slopes for control and weedy condition are the same. ☼ whether average slope for “plastic” is equal to average slope for “nonplastic” ☼ whether effects of plastic covers are the same ☼ whether effects of control and weedy treatments are the same. ☼ whether average effect for “plastic” treatments is equal to average effect for “nonplastic” treatments Q1.e. Which model do you select, based on above results (PROC GENMOD)? Make reference to contrasts. Write down conclusions. Indicate limitations. PROC NLMIXED is used to fit a model taking into account the distribution of the number of diseased trees within a plot as a binomial random variable with parameter ij depending on the treatment and time of measurement. Random block effects are also included in model. Full model fits separate slopes and intercepts for each treatment group, while the second model fits two models with common intercept and slope for treatments T and M and separate common intercept and slope for treatment PP and PC. Q1.f. Q1.g. Which model do you select, make reference to contrasts. Limitations. Write down the model for the proportion of diseased trees in a plot receiving a silver plastic cover at day t. Q1.h. Interpret coefficients of model. Q1.i. Write down the equation for the prediction of response in a plot with a silver plastic cover at day 85. And similarly for a weedy plot at day 85. Effect of Plant Species and Environmental Conditions on Epiphytic Population Sizes of Pseudomonas syringae and other Bacteria. R. D. O’Brien and S. E. Lindow. 1989. the American Phytopathological Society. V. 79, No. 5, 1989, 4 ST 524 NCSU - Fall 2008 Due: 12/08/08 Question 2 Scientific paper: Effect of Plant Species and Environmental Conditions on Epiphytic Population Sizes of Pseudomonas syringae and other Bacteria. R. D. O’Brien and S. E. Lindow. 1989. the American Phytopathological Society. V. 79, No. 5, 1989, Question 2 will Refer only to Experiment 1 in the above paper. Please answer the following Description Q2.1. Objective Q2.2. Response Variable Q2.3. Indicate What Are The Different Experimental Units, a. Main-Unit b. Sub-Unit c. Sub-Sub Unit: Q2.4. How Is A Block Defined? Q2.5. What Are The Factors And Their Type: Random Fixed, Counting Q2.6. Number Of Blocks Q2.7. Total Number Of Main-Units Q2.8. Total Number Of Sub-Units Q2.9. Total Number Of Sub-Sub-Units Q2.10. How Many Main-Unit Within Each Block Q2.11. How Many Sub-Units Within Each Main-Unit Q2.12. How Many Sub-Sub-Units Within Each Sub-Unit Statistical Analysis Q2.13. Linear Model, based on above information. Q2.14. Present The ANOVA Table, Sources Of Variation, Df, Ms If Possible, Q2.15. Sub-Sub-Plot Factor And Interactions Was Analyzed As A Randomized Complete Block Design. Indicate The Number Of Blocks That Should Be Considered. Q2.16. Compare Your ANOVA Table With Table 2. Experiment 1. If any discrepancies are observed, please explain Them. Q2.17. Describe An Alternative Plan Of Statistical Analysis For The Described Model. Effect of Plant Species and Environmental Conditions on Epiphytic Population Sizes of Pseudomonas syringae and other Bacteria. R. D. O’Brien and S. E. Lindow. 1989. the American Phytopathological Society. V. 79, No. 5, 1989, 5 ST 524 NCSU - Fall 2008 Due: 12/08/08 Q3. This question ask you to write down a description of your research project, indicating Q3.1. Objective Q3.2. Response Variable Q3.3. Experimental design. Detailed description a. Indicate What Are The Different Experimental Units, i. Main-Unit ii. Sub-Unit (if any) iii. Sub-Sub Unit (if any) b. How Is A Block Defined? c. What Are The Factors And Their Type: Random Fixed, Q3.4. Present the Analysis of Variance table a. Sources of Variation (SOV) b. Degrees of Freedom c. Expected Mean Squares d. F test for each SOV Q3.5. What type of statistical tests do you plan to carried on results to answer your research questions: pairwise mean comparisons, contrasts, orthogonal polynomial contrasts, curve fitting, etc Q3.6. Do you have repeated measures, how do you plan to analyze them? References https://www.crops.org/publications/pdfs/CESGuide.pdf https://www.crops.org/publications/pdfs/cinstauthmans.pdf https://www.crops.org/publications/pdfs/jpr-instructions.pdf From North Dakota Agricultural Exp Station- Research Project Guidelines Procedures: This section is to provide a general design of the project. To begin, re-state each of the objective statements followed by a description of the procedures/methods for that objective. The procedure statements should show that the research needs and plans have been considered carefully and the proposed work has the potential to provide data and information which will permit accomplishing the objectives. While the details of the experimental design do not need to be specified, provide sufficient information to indicate that an appropriate design is planned. Effect of Plant Species and Environmental Conditions on Epiphytic Population Sizes of Pseudomonas syringae and other Bacteria. R. D. O’Brien and S. E. Lindow. 1989. the American Phytopathological Society. V. 79, No. 5, 1989, 6