Word

advertisement
NAME _______________________
BIOL933 Midterm
(due Oct 15, 11:10 AM at lecture)
Include your R programs, include only the relevant parts of the output, and discuss each result. NO
POINTS WILL BE AWARDED TO OUTPUT WITHOUT A SENTENCE EXPLAINING THE
CONCLUSION.
Clarification questions should be directed only to Iago by e-mail (iago.hale@unh.edu). No consultation
with other students is allowed during the exam period, including R programming questions. Exams with
more than one unlikely identical mistake will receive zeroes, and the incident will be referred to the
Associate Dean of the Graduate School.
Problem 1
[19 points]
Short Questions – Long Thinkings
1.1 [3 points] For a given sample size, fixed variance, and fixed Type II error rate, (i) Which of the
two null hypotheses below has the greater Type I error rate?
a. Ho: μ1 < μ2
b. Ho: μ1 = μ2
(ii) In either case, as the distance between the means decreases, does the Type I error rate increase or
decrease?
1.2 [4 points] You would like to carry out a study to see if there is a difference in overall neural
activity between 5-year-olds who do not watch television and 5-year-olds who watch at least 15 hours of
television per week. Controlling for as many confounding factors as possible, a previous pilot study
working with two groups of 15 children each suggests that the difference is small, possibly on the order of
0.4s2.
Including the salaries of your assistants, facilities costs, and compensation to the children's families, the
experiment will cost you approximately $180 per child. A foundation interested in your work is willing
to give you a $25,000 grant to carry it out. Is this enough money to meet your desired levels of α = 0.05
and a Type II error rate of less than 20%? If it's not enough, how much more grant money do you need?
If it's more than enough, by how much?
1.3 [4 points] Even if an overall ANOVA is nonsignificant, it is still possible to find significant
differences among treatments using an LSD mean separation. It is also possible to find significant
contrasts. Explain how such things are possible and what your interpretation would be in each case.
1
1.4 [4 points] You've completed an experiment and wish to present your data graphically to support
your conclusions regarding differences among groups at the 95% confidence level. Specifically, you
decide to make a bar chart presenting the four treatments in your study, where the height of each bar is the
mean of each treatment group; for example:
Response (units)
.
25.0
20.0
15.0
10.0
5.0
0.0
1
2
3
4
Treatment Levels
Of the options below, which would be the most appropriate error bars to use in such a chart?
a)
b)
c)
d)
± 1 standard deviation
± 1 standard error
± 2 standard deviations
± 2 standard errors
Explain why (1-2 sentences). Which of the four options above would give you the smallest error bars
(and therefore make your results look the best)?
1.5 [2 points] A team of researchers is testing three new varieties of jasmine to see if they yield more
aromatic molecules (%w/w) than the cosmetics industry standard. [Solvent extraction of the oil from
flowers of the standard jasmine variety yields only about 0.2% aromatic molecules]. They divide their
heterogeneous research field into four large plots and plant one row of each variety (the three test
varieties and the standard) within each plot, arranging the four varieties within each plot at random.
Knowing that the researchers are able to process (i.e. harvest, extract the oil, and characterize it via HPLC)
only four rows simultaneously, that the %w/w of aromatic molecules falls rapidly once a flower is picked,
and that the %w/w of aromatic molecules is highly dependent on the time of harvest, which of the
following processing strategies would yield the most statistically-sensitive and accurate results? Explain
why.
A.
B.
C.
D.
Simultaneously process the four rows of the same variety.
Simultaneously process the four varieties from the same plot.
Simultaneously process the four varieties, choosing one from each plot.
Simultaneously process four randomly-selected rows from the field, disregarding variety.
1.6 [2 points] If the following statement is true, state that it is true. If it is false, state that it is false
and explain why it is false:
"A blocking variable can increase the power of an experiment by accounting for variation
that would otherwise contribute to the experimental error (MSE)."
2
Problem 2
[16 points]
Scenario: Territorial aggression and defense are common themes in the animal kingdom, and being able
to claim and defend a territory is often the only way to produce offspring. The males of many migratory
songbird species arrive in their territories in early spring and vigorously defend these
locations against all intruders. Song is known to be important in territory defense, with
higher song rates indicating increased aggression and vigilance in the territory owner.
Though songbirds cannot spend all their time singing (they have to eat too!), they must
be ready to mount a strong defense when an intruder enters their territory. You have
decided to conduct a pilot study to investigate what cues or signals Common
Yellowthroat warblers use to identify territorial intrusions. To study this, you will
present three experimental treatments to different individuals:
1. A stuffed model bird perched on a branch and placed in the territory
2. A broadcast of a recorded song from an unfamiliar male
3. Both a stuffed model and a song playback together
Your plan is to count the number of songs given in the 15 minutes immediately following each introduced
treatment and compare this number to a control. For the control, you will enter the territory as before but
will not actually present a model or broadcast a song.
Based on previous research with similar species, you suspect that territory size may have an impact on the
strength of response (e.g. in smaller territories with less resources, males may be more willing to defend
them). To control for this influence so that you may more accurately determine the effects of your
treatments, you divide your twelve test subjects (male warblers) into three groups, based on the sizes of
their territories (small, medium, and large); you then randomize the treatments within each of those
categories [see data below].
Treatment
Control
Song
Model
Both
2.1
Small
25
29
30
33
Territory Size
Medium
24
31
36
39
Large
28
35
33
38
[5 points] Describe in detail the design of this experiment [see appendix].
2.2 [4 points] Verify that the data meet all assumptions of the ANOVA, provide a plot of residual vs.
predicted values, and comment.
2.3 [4 point]
interpretation.
Present the complete ANOVA table and a box plot of treatment means. Provide
2.4 [5 points] Answer the following set of questions using the most sensitive test that controls MEER
at 5%:
a. Do broadcasted songs increase aggression in territory holders?
b. Does the introduction of model birds increase aggression in territory holders?
3
c. Does the introduction of model birds affect the response of Common Yellowthroats to
broadcasted songs?
Based on your results, make a conclusion about how Common Yellowthroats recognize intruders.
2.5 [4 points] Does territory size have an effect on the defense response of Common Yellowthroats?
Is there any evidence that the treatment effects differ depending on territory size? Keeping the results of
this pilot study in mind when you perform a larger follow-up experiment, should treatments be
randomized with territory size in mind?
4
Problem 3
[25 points]
As part of its investment in agricultural research related to water-use efficiency, the
E.U. recently funded a team of researchers to investigate the effect of deficit
irrigation strategies on olive oil quality. An olive grower in Spain agreed to provide
1 ha of mature olive grove to the researchers for the experiment. To reduce
heterogeneity within the farm, the researchers divided the 1 ha plot into four blocks
of equal area. [Hint: get a piece of paper and start sketching!] Each block was then
divided into four equal areas to which the four deficit irrigation strategies (see below)
were randomly assigned. As news of the study spread, three other farmers (two in
Italy and one in Greece) offered their lands as well, enabling the experiment to be
replicated a total of four times. Independent randomizations were performed for each farm, as shown in
the data table on the next page [Tip: Block 1 on one farm has no relationship to Block 1 on another farm].
The Treatments
Control
Strategy 1
Strategy 2
Strategy 3
Irrigation rate equal to 100% evapotranspiration (ET) rate
Irrigation rate equal to 90% ET rate
Irrigation rate equal to 80% ET rate
Irrigation rate equal to 70% ET rate
To control error as much as possible, the researchers decided to
process all the olives at the same facility in central Italy. While
this facility is state-of-the-art, its industrial-scale equipment is
not designed for tightly-controlled experiments with small
batches of olives. Specifically, it takes a full hour to clean all
the equipment thoroughly between batches, meaning that 30
hours of continual work are required to process all the olives
from a single farm. Fearing that the quality of the olive oil
could be affected by the amount of time that elapses between
harvest and processing, the researchers decided to:
I. Process the sixteen batches from each farm as a group, one immediately after the
other over an intense working period of 30 hours. [Note: The term "batch" means all
the olives harvested from one treatment in a block.]
Now, even though the researchers can clean out the system after each batch, it is not possible in terms of
time or money to replace the expensive filters in the system that often. At best, they can replace them
once every four batches. Fearing that the quality of the olive oil could be affected by different sets of
filters as well as by the deteriorating condition of those filters over the course of four batches, the
researchers decided to:
II. Process the four batches from each block as a group, one immediately after the other,
using the same filters.
III. For each farm, shuffle the processing order of the treatments so that each treatment
has one chance to be processed first, second, third, and fourth after a filter change.
The data for the experiment are shown on the following page. The average titratable acidity, measured in
ml of 0.1 M NaOH and based on 10 subsamples, was recorded for each batch as a measure of olive oil
quality.
5
Block
- FARM 1 Moura,
Spain
- FARM 2 Oliena,
Italy
- FARM 3 Volos,
Greece
- FARM 4 Bari,
Italy
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
Control
1st
12.67
4th
7.37
2nd
6.07
3rd
19.23
3rd
15.95
2nd
15.22
4th
2.73
1st
11.45
2nd
11.81
3rd
7.43
1st
0.54
4th
13.33
4th
20.24
1st
13.23
3rd
15.95
2nd
20.20
Treatment
90% ET
80% ET
3rd
20.80
2nd
6.56
1st
6.78
3rd
9.65
4th
9.42
1st
11.94
2nd
1.95
4th
8.96
1st
4.13
4th
14.11
3rd
9.69
1st
11.37
2nd
2.35
3rd
0.81
4th
5.56
2nd
13.17
4th
1.62
1st
1.93
2nd
3.19
4th
7.32
3rd
5.30
2nd
10.29
1st
6.95
3rd
9.56
2nd
19.06
3rd
16.60
4th
17.11
2nd
20.10
1st
4.10
4th
22.81
3rd
8.59
1st
6.46
70% ET
4th
32.31
2nd
24.69
3rd
20.20
1st
19.54
2nd
21.76
4th
20.10
1st
21.79
3rd
17.72
3rd
27.75
1st
21.95
4th
6.64
2nd
12.19
1st
37.42
3rd
32.63
2nd
24.73
4th
19.06
In the above table, "1st" means the corresponding batch was the first to be processed
after a filter change. "2nd" means it was the second to be processed, and so forth.
3.1
[5 points] Describe in detail the design of this experiment [see appendix].
3.2 [4 points] Using a reduced model (i.e. a model that includes no block interactions), show that the
data meet the assumptions of normality of residuals and homogeneity of variances among treatments.
(You are not asked to test non-additivity here.)
3.3 [3 points] Perform the appropriate ANOVA for this experiment and present the full ANOVA table.
Is there an effect of irrigation rate on olive oil quality?
Answer question 3.4, assuming no other analysis will be performed on the data.
3.4 [4 points] Maintaining MEER at 5%, use the most sensitive method to characterize the functional
relationship between %ET rate and titratable acidity, indicating which components of the relationship are
significant (i.e. linear, quadratic, etc.). Given that lower acidity denotes higher quality olive oil, do the
data suggest that the quality of olive oil can be improved through a deficit irrigation strategy? To help
you interpret this relationship, present a boxplot (or scatterplot) of acidity versus %ET rate.
Answer question 3.5, assuming no other analysis will be performed on the data.
3.5 [3 points] Using the most sensitive method while maintaining MEER at 5%, do any of the deficit
irrigation strategies in the study significantly improve olive oil quality relative to the non-deficit strategy
(i.e. 100% ET)? Of the rates under investigation here, what is the minimum %ET rate that can be applied
without significantly reducing olive oil quality, relative to the non-deficit strategy?
6
3.6
[2 points] Assign a p-value to the following statement:
"There is not sufficient evidence to reject the null hypothesis of homogeneous
treatment effects across farms (p = xxx)."
3.7 [2 points] Based on the information given, provide a brief explanation of the rationale behind
Points I (i.e. quickly processing all the batches from a farm over a 30-hour period) and II (i.e. processing
all the batches from one block using the same filters) above.
3.8 [2 points] There is no true replication in this experiment (i.e. more than one experimental unit
treated alike); so what is being used to estimate the experimental error? [To answer this question,
consider the linear model you used in your analysis; what terms are missing?]
7
Problem 4
[14 points]
Background (for the interested): In response to the criticism that growing crops commercially
for biofuels necessarily threatens food security by competing for prime agricultural land, biofuels
advocates point to a plant like Jatropha curcas. Long grown in sub-Saharan Africa as a
hedgerow, J. curcas also produces seeds high in oil content; and once extracted, this oil can be
used as feedstock for biodiesel production. Proponents of J. curcas claim its production will not
threaten food security since the plant can grow in marginal areas with low soil fertility and long
dry periods. These are the claims; but as of now, little formal research has been carried out on J.
curcas, an undomesticated species with notoriously variable yields.
You have been asked by the International Biofuels Journal to review a recently-submitted study on the oil
yield of Jatropha curcas grown on lands of various agricultural qualities. The manuscript contains the
following description of the experiment:
"Using soil maps, agricultural production maps, and weather maps, we identified two separate sites
in each of the five following land categories in Malawi:
1.
3.
5.
Prime irrigated agricultural (PIA)
Moderately suitable agricultural (MSA)
Nonarable (NA)
2.
4.
Prime rainfed agricultural (PRA)
Poor agricultural (PA)
We established 200 J. curcas trees on each site and maintained them for a period of three years. In
the third year, we randomly harvested 50 trees from each site. The seeds from each tree were then
processed and the individual oil yields for each tree measured."
In their conclusions, the authors state:
"The ranked mean oil yields (kg/tree), by land type, are shown below:
Land Type
PIA
PRA
MSA
NA
PA
kg Oil/Tree
2.420
1.305
0.520
0.375
0.335
At a confidence level of 95%, however, we found no significant differences in oil yield per tree among
the sites. This finding supports the claim that J. curcas can be grown commercially on marginal
lands and, as such, is a suitable species for investment by the biofuels industry."
Answer the following questions, knowing that the estimated components of variance in the study are as
follows:
Variance among land types........................... 0.55913
Variance among sites within a given land type....... 0.46587
Variance among trees within a given site............ 0.17174
4.1
[8 points] What was the power of this study? Was it enough to justify the authors' conclusion?
4.2 [6 points] What is the level of replication in this study? Assuming that 50 trees is the optimum
number to harvest from each site, how many sites would you test for each land type if you were to repeat
the experiment?
8
Appendix
When you are asked to "describe in detail the design of this experiment," please do so by completing the
following template:
Design:
Response Variable:
Experimental Unit:
Class
Variable
1
2
↓
n
Block or
Treatment
No. of
Levels
Subsamples?
YES / NO
Description
For Example, the correct table for Problem 2 from HW 4 (Topics 6-7) is:
Design:
Response Variable:
Experimental Unit:
Class
Variable
1
2
3
4
Block or
Treatment
Block
Block
Block
Treatment
Subsamples?
4x4 Latin Square, replicated 3 times, with independent rows (days) and shared
columns (farms)
PM-10 concentrations in the air (ug/m3)
Plot within a farm
No. of
Levels
3
4
12
4
NO
Description
Season
Farm
Day (4 levels per season; 12 total in the experiment)
Plow design
(If you put "YES" in the cell to the left, you would describe the
subsamples here; for this particular experiment, you could've said
"YES" here, because technically each plot emission number given to
you is the mean value over multiple traps.)
9
Download