Final_Exam_Q1-3_solution.doc

advertisement
ST 524
NCSU - Fall 2008
Due: 12/08/08
Final TAKE HOME EXAM – FALL 08
Analysis - Disease Progress Curve :
Controlling papaya ring spot virus
Data kindly provided by Pedro Torres, graduate student in Statistics and TA for our course.
Data was used for a study that is summarized below.
 Title: Nonlinear models for analyzing disease progress
 Authors: Raúl Macchiavelli, Wilfredo Robles, Edwin Abreu and Alberto Pantoja.
College of Agricultural Sciences,Univ. of Puerto Rico – Mayagüez
Monitoring plant diseases
 Diseases are normally monitored over time,
assessing the amount of disease present in a
population of plants:
 “Disease Progress Curve”
 Represents an interpretation of all host, pathogen
and environmental effects occurring during an
epidemic (Campbell and Madden, 1990)
Disease Progress Curve - Models
Y
amount of disease
Proportion of diseased trees (out of 20)
t
time
 dY/dt absolute rate of disease increase (or decrease)
 Quantitative description of epidemics:
dY/dt vs. Y,
dY/dt vs. t
Logistic Disease Progress Curve
Gompertz Disease Progress Curve
dY
 rl Y (1  Y )
dt
1
Y
1  exp(  B  rl t )
 Y
log 
 1 Y
dY
 rg Y   log Y 
dt
Y  exp   B exp(rg t ) 
 log   log Y    log   log Y0   rg t
 Y0 

  rl t
  log 

 1  Y0 
Treatment is a Factor: categorical variable.
 Need to create 0/1 variables to identify
each treatment level.
Day is a quantitative explanatory variable.
 Effect of each Treatment level: Differences
in intercept of curve
 Day effect: slope of curve is not zero
 Treatment*Day: changes in slope for some
treatments.
Controlling papaya ring spot virus
Twenty plots were planted, each with 20 papaya plants
Controlling papaya ring spot virus
Twenty plots were planted, each with 20 papaya plants

There were 4 different treatments for the control of
certain insects (aphids) which are vectors of the virus
o
Control (no weeds) T
o
Plastic (black)
PC
o
Plastic (silver)
PP
o
Weeds
M

Each treatment was randomly assigned to 5 plots (CRD)




The experiment was monitored weekly
(8 weeks)
Each week, every plant was checked to see whether it
showed symptoms
Once the plant showed symptoms, it was classified as
diseased for the rest of the experiment
Analyzing the Data

The variable of interest is the disease index for
treatment i, time j and plot k:
number of plants with symptoms
Yijk 
20
Effect of Plant Species and Environmental Conditions on Epiphytic Population Sizes of Pseudomonas syringae and
other Bacteria. R. D. O’Brien and S. E. Lindow. 1989. the American Phytopathological Society. V. 79, No. 5,
1989,
1
ST 524

NCSU - Fall 2008
Due: 12/08/08
Problems with traditional approach
o Non normal distribution
o Non linear models
o Non constant Variances
o Observations of the same tree in different weeks are
dependent
o Observations on the same plot may not be
independent (contagion)
Traditional analysis
o Fit separate curves for each plot using linear
regression
o Compare slopes for different treatments
Generalized linear models
o
The linear component is defined like in traditional linear models:
i  xi '
o
A monotonic differentiable link function g describes how the expected value of y , E Y    , related to
the linear predictor:
g ( i )  xi '
o
The response variables y are independent and have a probability distribution from an exponential family.
This implies that the variance of the response depends on the mean through a variance function V:
var(Yi )   V ( i )
o
The dispersion parameter  is either assumed known (for example, for the binomial distribution,  = 1) or it
must be estimated to account for overdispersion.
a) Nonlinear fitting of observed proportion of diseased plants (out of 20 plants per plot)
using PROC NLIN in SAS

Need to set up dummy variables for treatments

Needs initial values of parameters

Logistic Fit
Yijk 
number trees diseased
20
 E Yijk  
     o   i  1  day j 90    2i  day j  90  * i 
log 
 1  E Yijk   ij
 

Yijk 
exp ij 
1  exp ij 
 eijk
ij   o   i  1  day j 90    2i  day j  90  * i 
  oi   1i   day ij 90 
Normal  0,  2 
eijk

Gompertz Fit
Yijk 


number trees diseased
20
 log   log E Yijk    ij   o   i  1  day j 90    2i  day j  90  * i 




Yijk  exp  exp  ij   eijk
ij   o   i  1  day j 90    2i  day j  90  * i 
  oi   1i   day ij 90 
eijk
Normal  0,  2 
Effect of Plant Species and Environmental Conditions on Epiphytic Population Sizes of Pseudomonas syringae and
other Bacteria. R. D. O’Brien and S. E. Lindow. 1989. the American Phytopathological Society. V. 79, No. 5,
1989,
2
ST 524
NCSU - Fall 2008
Due: 12/08/08
b) Estimation of parameters and treatment effect of Generalized Linear Model with PROC
GENMOD in SAS and
o
o
o
o
No need for initial estimates
Use of CLASS statement sets up dummy variables (treatment effect) directly
Use of CONTRAST statement allows comparing treatment effect, equality of slopes, etc.
Use of ESTIMATE statement allows predictions, confidence intervals .
Yijk  number of diseased trees out of 20 per plot
binomial  20,  ij 
Yijk
 
log  ij
 1 
ij


     i  1day j   2i  day j  i 

E Yijk    ij
Var Yijk   20 ij 1   ij 
i = 1, 2, 3, 4 treatments
j=1, 2, 3, 4, 5, 6, 7, 8 timepoints
k = 1, 2, 3, 4, 5 blocks
c)
Note
Non linear models with normal residuals (NLIN) do not take into account actual
distribution or longitudinal nature.
 Because of contagion, Number of diseased trees (out of 20) is not a binomial
random variable, variance may not correspond to a binomial random variable.

E Yijk    ij
Var Yijk   20 ij 1   ij  
  overdispersion parameter

Non linear models fitting a binomial distribution with possibly overdispersion do not
take into account longitudinal nature.
 Overdispersion parameter  may be estimated as the square root of deviance
divided by its degrees of freedom. If the ratio (deviance/d.f) is greater than 1
indicates that overdispersion is present. Use scale= deviance option in PROC
GENMOD, MODEL statement, to fit a binomial with overdispersion. Standard errors
and tests are adjusted to account for extra variation
d) Estimation of the parameters of Generalized Linear Model with PROC NLINMIXED in SAS
 Repeated observations from the same plot are correlated, same random plot effect.
Yijk  number of diseased trees out of 20 per plot
Yijk | uik
binomial  20,  ij 
  ij
log 
 1 
ij

uik
E Yijk    ij

     i  1day j   2i  day j  i   uik

Normal  0,  2 
i = 1, 2, 3, 4 treatments
j = 1, 2, 3, 4, 5, 6, 7, 8 timepoints
k = 1, 2, 3, 4 blocks
Var Yijk   20 ij 1   ij 
 ij 

exp int   i   1day j   2i  day j  i 


1  exp int   i   1day j   2i  day j  i 
 No accounting for correlation between measurements within same tree.
Effect of Plant Species and Environmental Conditions on Epiphytic Population Sizes of Pseudomonas syringae and
other Bacteria. R. D. O’Brien and S. E. Lindow. 1989. the American Phytopathological Society. V. 79, No. 5,
1989,
3

ST 524
NCSU - Fall 2008
Due: 12/08/08
Questions
PROC NLIN is used to fit two disease curves.
Q1.a.
Write down both estimated equations.
Q1.b.
Calculate R2, measure of goodness of fit,
Q1.c.
which model, GOMPERTZ or Logistic, shows better fit?
PROC GENMOD is used for a fitting the number of diseased trees within each plot as a binomial
random variable with n=20 (trees) and the probability for a tree being diseased as a function of
Treatment and Day. Full model fits four slopes (for linear time effect), one for each treatment
and four separate intercepts (treatment effects).
Q1.d.
Would you recommend to adjust for overdispersion?.
Contrasts test
☼ whether slopes for plastic covers have same effects
☼ whether slopes for control and weedy condition are the same.
☼ whether average slope for “plastic” is equal to average slope for “nonplastic”
☼ whether effects of plastic covers are the same
☼ whether effects of control and weedy treatments are the same.
☼ whether average effect for “plastic” treatments is equal to average effect for
“nonplastic” treatments
Q1.e.
Which model do you select, based on above results (PROC GENMOD)? Make
reference to contrasts. Write down conclusions. Indicate limitations.
PROC NLMIXED is used to fit a model taking into account the distribution of the number of
diseased trees within a plot as a binomial random variable with parameter  ij depending on the
treatment and time of measurement. Random block effects are also included in model. Full
model fits separate slopes and intercepts for each treatment group, while the second model fits
two models with common intercept and slope for treatments T and M and separate common
intercept and slope for treatment PP and PC.
Q1.f.
Q1.g.
Which model do you select, make reference to contrasts. Limitations.
Write down the model for the proportion of diseased trees in a plot receiving a silver
plastic cover at day t.
Q1.h. Interpret coefficients of model.
Q1.i. Write down the equation for the prediction of response in a plot with a silver plastic
cover at day 85. And similarly for a weedy plot at day 85.
Effect of Plant Species and Environmental Conditions on Epiphytic Population Sizes of Pseudomonas syringae and
other Bacteria. R. D. O’Brien and S. E. Lindow. 1989. the American Phytopathological Society. V. 79, No. 5,
1989,
4
ST 524
NCSU - Fall 2008
Due: 12/08/08
Question 2
Scientific paper:
Effect of Plant Species and Environmental Conditions on Epiphytic Population Sizes of
Pseudomonas syringae and other Bacteria. R. D. O’Brien and S. E. Lindow. 1989. the
American Phytopathological Society. V. 79, No. 5, 1989,
Question 2 will Refer only to Experiment 1 in the above paper.
Please answer the following
Description
Q2.1.
Objective
Q2.2.
Response Variable
Q2.3.
Indicate What Are The Different Experimental Units,
a. Main-Unit
b. Sub-Unit
c. Sub-Sub Unit:
Q2.4.
How Is A Block Defined?
Q2.5.
What Are The Factors And Their Type: Random Fixed,
Counting
Q2.6.
Number Of Blocks
Q2.7.
Total Number Of Main-Units
Q2.8.
Total Number Of Sub-Units
Q2.9.
Total Number Of Sub-Sub-Units
Q2.10.
How Many Main-Unit Within Each Block
Q2.11.
How Many Sub-Units Within Each Main-Unit
Q2.12.
How Many Sub-Sub-Units Within Each Sub-Unit
Statistical Analysis
Q2.13.
Linear Model, based on above information.
Q2.14.
Present The ANOVA Table, Sources Of Variation, Df, Ms If Possible,
Q2.15. Sub-Sub-Plot Factor And Interactions Was Analyzed As A Randomized
Complete Block Design. Indicate The Number Of Blocks That Should Be
Considered.
Q2.16. Compare Your ANOVA Table With Table 2. Experiment 1. If any
discrepancies are observed, please explain Them.
Q2.17. Describe An Alternative Plan Of Statistical Analysis For The Described
Model.
Effect of Plant Species and Environmental Conditions on Epiphytic Population Sizes of Pseudomonas syringae and
other Bacteria. R. D. O’Brien and S. E. Lindow. 1989. the American Phytopathological Society. V. 79, No. 5,
1989,
5
ST 524
NCSU - Fall 2008
Due: 12/08/08
Q3. This question ask you to write down a description of your research project,
indicating
Q3.1.
Objective
Q3.2.
Response Variable
Q3.3.
Experimental design. Detailed description
a. Indicate What Are The Different Experimental Units,
i. Main-Unit
ii. Sub-Unit (if any)
iii. Sub-Sub Unit (if any)
b. How Is A Block Defined?
c. What Are The Factors And Their Type: Random Fixed,
Q3.4.
Present the Analysis of Variance table
a. Sources of Variation (SOV)
b. Degrees of Freedom
c. Expected Mean Squares
d. F test for each SOV
Q3.5.
What type of statistical tests do you plan to carried on results to answer
your research questions: pairwise mean comparisons, contrasts, orthogonal
polynomial contrasts, curve fitting, etc
Q3.6.
Do you have repeated measures, how do you plan to analyze them?
References
https://www.crops.org/publications/pdfs/CESGuide.pdf
https://www.crops.org/publications/pdfs/cinstauthmans.pdf
https://www.crops.org/publications/pdfs/jpr-instructions.pdf
From North Dakota Agricultural Exp Station- Research Project Guidelines
Procedures:
This section is to provide a general design of the project. To begin, re-state each of
the objective statements followed by a description of the procedures/methods for that
objective. The procedure statements should show that the research needs and plans have
been considered carefully and the proposed work has the potential to provide data and
information which will permit accomplishing the objectives.
While the details of the experimental design do not need to be specified, provide
sufficient information to indicate that an appropriate design is planned.
Effect of Plant Species and Environmental Conditions on Epiphytic Population Sizes of Pseudomonas syringae and
other Bacteria. R. D. O’Brien and S. E. Lindow. 1989. the American Phytopathological Society. V. 79, No. 5,
1989,
6
Download