NAME PLS 205 Final [Total Points in Exam = 100] March 13, 2014

advertisement
NAME _______________________
PLS 205 Final [Total Points in Exam = 100]
March 13, 2014
Due Date: Tuesday, March 18, by 5:00 pm; 122 Robbins Hall
Include your SAS programs, include only the critical parts of the SAS output, and discuss each result.
NO POINTS WILL BE AWARDED TO OUTPUTS WITHOUT A SENTENCE EXPLAINING
THE CONCLUSION.
Clarification questions should be directed to Whitney by e-mail only. No consultation with other students
is allowed during the exam period (including SAS programming questions). Exams with more than one
unlikely identical mistake will receive zeroes, and the incident will be referred to the Office of
Student Judicial Affairs.
Question 1
[25 points]
Researchers at the UC Davis Veterinary School are trying to come up with an alternative diet for piglets
that cannot nurse.
They randomly pick two farms to carry out their research. They are interested in working with three
particular breeds of pigs, to understand how the different diets affect these three specific breeds. First,
they randomly pick six newborn piglets from each breed at each farm. Next, they randomly assign two
diets over the six piglets, per breed (three piglets per Diet-Breed combination). Independent
randomizations are performed within each breed in each farm. They take weekly weights (in ounces) of
each piglet for three weeks.
Diet
Farm
Breed
1
1
2
3
2
1
2
1
Replication
1
2
3
1
2
3
1
2
3
1
2
3
1
1 week
75
77
73
81
82
77
92
91
93
59
50
59
69
2 week
89
74
83
91
89
82
102
100
89
79
76
74
73
2
Time
3 week 1 week
91
59
88
76
89
68
97
81
88
76
86
74
105
83
103
81
107
83
83
56
71
71
79
61
77
65
2 week
74
75
76
78
84
82
96
89
97
62
53
74
77
3 week
72
72
76
86
82
80
95
103
93
65
68
71
73
3
2
3
1
2
3
59
71
77
84
80
71
72
89
91
82
78
82
96
100
87
66
68
71
71
77
59
62
80
95
77
71
78
84
89
81
1.1
[2 points] Describe in detail the design of this experiment [see appendix].
1.2
[3 points] Show that the data meet the assumptions of normality of residuals and homogeneity of
variances among Diets, among Breeds and among Times.
1.3
[7 points] Run the appropriate ANOVA and answer the following questions (with p-values):
a. Is there a significant effect of breed on weight gain?
b. Is there a significant effect of diet on weight gain?
c. Is there a significant effect of time (weeks) on weight gain?
d. Are there any significant interactions between Diet and Breeds?
1.4
[3 points] If appropriate, use conservative degrees of freedom to test the Time and all 2 and 3 way
Diet and Breed interactions with Time. State whether or not you should continue the analysis past
this step. If so, carry out the appropriate next analysis and run any other assumptions tests. Do
your conclusions about Time or Time interactions change using regular and conservative degrees
of freedom?
1.5
[3 points] Carry out the appropriate means separation tests to compare all pairs of Breeds
(controlling the maximum experiment wise error rate). Which breed weighed the most (on
average)?
1.6
[3 points] Use contrasts to determine if the response in time is lineal.
1.7
[3 points] Graph the main effects or the appropriate interactions if significant and comment.
1.8
[1 points] What other measurement could the researchers have taken before starting the experiment
that could have been used as covariable to reduce the variability of the experiment?
Question 2
[25 points]
The invasive Asian citrus psyllid (Diaphorina citri) first appeared in California in 2008. It is a vector for
the citrus disease Huanglongbing, which causes fruit to be deformed and green.
Researchers are testing a new pesticide, but want to be able to use the pesticide inside of greenhouses,
where the trees are overwintering when small. They are focused on Southern California, where the
problem has been increasing and want to be able to generalize their results over the region.
They randomly select 10 farmers/locations, which all grow lemons in greenhouses. At each location, the
farmers randomly select two greenhouses. They want to test a new pesticide on two new lemon varieties
(Varieties 1 and 2) grown in Southern California. They also select an older cultivar to use as a control
(Variety 3). They select 3 rates of the pesticide (0, 2, and 4 qt./A). They randomize combinations of
variety and rate inside each greenhouse. After spraying the pesticide, each tree is enclosed in a screen, and
the same number of pysllids is put into the area around each screened tree.
At the end of the season, the number of healthy fruit is counted per tree. Results are in the table below:
Locations
Pesticide
Greenhous
e
Rate
(kg/hamo)
0
1
2
4
0
2
2
4
1
2
3
4
5
6
7
8
9
10
1
74
65
55
64
57
56
64
81
56
65
2
56
67
76
56
56
54
76
62
71
57
3
61
55
50
61
67
59
50
50
51
57
1
82
68
57
71
84
72
81
81
67
69
2
80
64
65
69
51
71
68
72
56
66
3
64
59
51
63
70
68
52
68
66
62
1
78
77
60
83
66
74
72
79
87
66
2
73
50
67
63
67
68
64
85
72
67
3
80
85
64
82
62
78
58
81
58
69
1
61
66
50
52
60
56
61
77
57
60
2
56
50
50
50
50
56
75
50
58
66
3
66
57
57
55
50
50
66
71
51
60
1
63
60
55
64
64
64
75
61
62
79
2
67
60
58
77
50
65
59
57
55
63
3
56
63
62
72
50
59
76
81
64
51
1
80
73
73
81
58
67
75
68
57
68
2
73
67
74
64
64
56
70
83
65
58
3
73
66
73
67
60
62
55
74
73
60
Variety
2.1
[2 points] Describe in detail the design of this experiment [see appendix].
2.2
[3 points] Show that the data meet the assumptions of normality of residuals and homogeneity of
variances among treatment groups.
2.3
[6 points] Using a single treatment factor with 9 levels representing the 9 combinations of pesticide
rate and variety, carry out the appropriate Proc GLM for this experiment and answer the
following questions based on the overall ANOVA (report p-values):
a.
b.
c.
d.
2.4
Does treatment significantly affect fruit number?
Is the effect of treatment location specific?
Is there significant variation in fruit number between greenhouses?
Is there significant variation in fruit number among locations?
[6 points] Use contrasts to partition the sums of squares associated with 2.3 in order to help you
answer the following questions (report p-values):
a.
Is there a difference between the new varieties and the old variety?
b.
Is there a difference between the two new varieties?
c.
Characterize the response of fruit number to pesticide rate.
d.
e.
Are the dosage responses different between old and new varieties?
Are the dosage responses different between the two new cultivars?
2.5 [3 points] Rank the 9 treatment combinations in terms of mean fruit number and assign them to
significance groups using the Tukey’s method of means separation.
2.6 [1 points] Among the group of varieties with the highest yield (NS differences in yield among each
other in Tukey), which variety would you recommend to minimize the use of pesticide? Justify.
2.7
[1 points] Would you feel comfortable in extending your recommendation to all greenhouse
locations in Southern California?
2.8
[3 points] What is the major source of variation in this experiment (use Proc VarComp)?
Question 3
[20 points]
An investigator would like to study the effect of a gene on tolerance to soil salinity in corn. The
investigator is also interested in how the effect of the gene may differ in different genetic backgrounds of
4 different cultivars that have low tolerance to soil salinity. Cultivars 1 and 2 are widely grown in farm
ecosystems in Africa while cultivars 3 and 4 are grown in farm ecosystems of Central America. The
investigator introduced a new favorable allele of the target gene from a donor parent that is tolerant to soil
salinity into the 4 different recipient cultivars by six generations of backcrossing followed by selection of
lines fixed (homozygous) for the tolerant (T) or susceptible (S) alleles resulting in a total of 8 new corn
lines (two sister lines per cultivar).
This is a side note to explain the difference between genes and alleles. A gene is a coding
sequence in DNA that is transcribed and translated into a protein that carries out a
function. Alleles are different variants of the same gene. In this example each gene has a
susceptible (S) and a tolerant (T) allele to salinity. Except for this one gene the two lines
within a specific variety are almost identical (=isogenic or sister lines).
An experimental field with high soil salinity was divided into four 240 m2 areas that were considered
relatively homogeneous. Within each of these four areas the field was divided into eight 30 m2 plots. The
eight lines were randomly assigned to one of the 30 m2 plots within each 240 m2 field segment. Each of
the four 240 m2 areas was randomized independently. In summary, each line was replicated 4 times, once
in each of the 240 m2 field segment. The investigator evaluated the varieties for tolerance for soil salinity
by measuring their yield. Yield is reported in bushels per acre in the table below (note that one plot was
lost):
B1
African
Cultivar 1
African
Cultivar 2
C. American
Cultivar 3
C. American
Cultivar 4
B2
B3
B4
Gene
1: T
Gene
1: S
Gene 1:
T
Gene 1:
S
Gene 1:
T
Gene 1:
S
Gene 1:
T
Gene 1:
S
138
97
124
.
88
58
60
50
114
115
99
86
67
74
62
55
89
48
74
46
70
44
60
41
73
49
49
46
47
44
43
40
3.1 [2 points] Use the table in the appendix to describe the design of the experiment.
3.2 [3 points] Check all of the assumptions of your model.
[6 points] If necessary transform the data using a power transformation and repeat the analysis
with the transformed data. Show that all the assumptions are met in the transformed data. Run the
appropriate ANOVA model to test for effects of the gene and cultivar genetic background and the
interactions between them. Report the results of your ANOVA and describe which effects are
significant.
3.4 [6 points] Before starting the experiment the investigator formulated questions of interest about
the significant genetic background effects and interactions that may arise. Please test the following
questions using contrasts:
3.4.1 Is there a difference in yield between the genetic background of the African and C.
American cultivars?
3.4.2 Is there a difference in yield between the two African cultivars?
3.4.3 Is there a difference in yield between the two C. American cultivars?
3.4.4 Is the effect of the gene on yield different between African and C. American cultivars?
3.4.5 Is the effect of the gene on yield different between the two African cultivars (i.e. between
cultivars 1 and 2)?
3.4.6 Is the effect of the gene on yield different between the two C. American cultivars (i.e.
between cultivars 3 and 4)?
3.3
3.5 [3 points] Please present line plots of Gene by cultivar interaction for both the original and
transformed data. Based on these graph, and the significance of the ANOVAs and contrasts of
transformed and untransformed data indicate which of the following statements reflect the reality
and why:
 The Tolerant allele increases yield relative to the susceptible allele in at least some cultivars
 The Tolerant allele decreases yield relative to the susceptible allele in at least some cultivars


The Tolerant allele increases yield relative to the susceptible allele in all cultivars
The Tolerant allele decreases yield relative to the susceptible allele in all cultivars
Question 4
[30 points]
A researcher is interested in testing the effect of two forms of gibberellins (GA3 and GA4) and different
photoperiods (8h, 10h and 12 h of light) on total seed weight of Arabidopsis plants. The researcher has
access to 9 growth chambers from three different brands designated hereafter “1”, “2” and “3” (three
chambers per brand) and decides to randomize the experiment within each chamber brand to reduce
variability. He first assigns randomly the three chambers from each brand to one of the three different
photoperiods, and then within each chamber randomly assigns four complete trays sown with 100
Arabidopsis plants each to four combinations of GA treatment: a control with no G3 or G4, a tray sprayed
with G3, a tray sprayed with G4, and a tray sprayed with a combination of G3+G4. During the experiment
(after the treatments are applied) the researcher notices that some plants are infected by a fungal disease.
To try to account for the effect of the disease he takes a visual assessment of the percent diseased plants
per tray (expressed as a %). At the end of the experiment the researcher determines the weight of all the
seeds produced by each trait.
Chamber
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
GA
C
C
C
C
C
C
C
C
C
G3
G3
G3
G3
G3
G3
G3
G3
G3
Light SeedW
8
111.28
8
111.12
8
115.47
10
115.89
10
112.04
10
115.57
12
118.89
12
115.17
12
113.99
8
102.18
8
98.35
8
97.41
10
103.71
10
101.21
10
102.88
12
99.16
12
99.47
12
103.37
%Dis
15
14
13
14
15
14
13
14
15
19
20
20
20
19
20
21
20
19
Chamber
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
GA
G4
G4
G4
G4
G4
G4
G4
G4
G4
G4&3
G4&3
G4&3
G4&3
G4&3
G4&3
G4&3
G4&3
G4&3
Light
8
8
8
10
10
10
12
12
12
8
8
8
10
10
10
12
12
12
SeedW
117.09
109.19
111.10
119.23
115.30
118.30
113.31
110.86
117.75
92.40
91.89
96.94
98.45
92.23
95.46
94.50
94.76
97.72
%Dis
13
15
14
13
14
13
15
15
13
23
22
21
22
23
22
23
22
22
1) [2 points] Describe in detail the design of this experiment [see appendix].
2) [5 points] Run the ANOVA and ANCOVA designs and compare the results. Use the correct
experimental design and correct error terms in both of them!
3) Interpret the results of the ANOVA and ANCOVA. Explain any difference in the two analyses.
Based on the ANCOVA results answer the following questions (report correct P value for each
answer).
3.1. [2 points] Are there significant differences in seed-weigh among different GA levels?
3.2. [2 points] Are there significant interactions for seed-weight between the GA3 and GA4
effects?
3.3. [2 points] Are there significant differences in seed-weigh among different photoperiods?
3.4. [2 points] Are the differences in photoperiod lineal?
3.5. [2 points] Are the differences in photoperiod different at the different GA treatments?
4) Answer the following questions:
4.1 [2 points] Are the slopes homogeneous within GA treatments and within photoperiods?
4.2 [2 points] Are the residuals from the ANCOVA model normally distributed?
4.3 .[2 points] Are the variances for photoperiod and for GA treatments homogeneous?
4.4. [2 points] Is the regression between seed-weight and disease significant? What is the average
slope? Explain what does that slope and its sign mean in terms of seed-weight and disease
levels.
4.5. [2 points] Is the disease independent of the treatments?
4.6. [3 points] What factors affect significantly the level of disease? Is there any significant
interaction? How can this result help you to understand the differences between the
ANOVA and ANCOVA results?
Appendix
When you are asked to "describe in detail the design of this experiment," please do so by completing the
following template:
Design:
Response Variable:
Experimental Unit:
Class
Variable
1
2
↓
n
Block or
Treatment
Subsamples?
Covariable?
Number of
Levels
Fixed or
Random
Description
YES / NO
YES / NO
NOTE: There is a new section in the above table ("Covariable?"). Declare whether or not the analysis includes a
convariable; if it does, provide a description.
Download