NAME _______________________ BIOL933 Midterm (due Oct 15, 11:10 AM at lecture) Include your R programs, include only the relevant parts of the output, and discuss each result. NO POINTS WILL BE AWARDED TO OUTPUT WITHOUT A SENTENCE EXPLAINING THE CONCLUSION. Clarification questions should be directed only to Iago by e-mail (iago.hale@unh.edu). No consultation with other students is allowed during the exam period, including R programming questions. Exams with more than one unlikely identical mistake will receive zeroes, and the incident will be referred to the Associate Dean of the Graduate School. Problem 1 [19 points] Short Questions – Long Thinkings 1.1 [3 points] For a given sample size, fixed variance, and fixed Type II error rate, (i) Which of the two null hypotheses below has the greater Type I error rate? a. Ho: μ1 < μ2 b. Ho: μ1 = μ2 (ii) In either case, as the distance between the means decreases, does the Type I error rate increase or decrease? 1.2 [4 points] You would like to carry out a study to see if there is a difference in overall neural activity between 5-year-olds who do not watch television and 5-year-olds who watch at least 15 hours of television per week. Controlling for as many confounding factors as possible, a previous pilot study working with two groups of 15 children each suggests that the difference is small, possibly on the order of 0.4s2. Including the salaries of your assistants, facilities costs, and compensation to the children's families, the experiment will cost you approximately $180 per child. A foundation interested in your work is willing to give you a $25,000 grant to carry it out. Is this enough money to meet your desired levels of α = 0.05 and a Type II error rate of less than 20%? If it's not enough, how much more grant money do you need? If it's more than enough, by how much? 1.3 [4 points] Even if an overall ANOVA is nonsignificant, it is still possible to find significant differences among treatments using an LSD mean separation. It is also possible to find significant contrasts. Explain how such things are possible and what your interpretation would be in each case. 1 1.4 [4 points] You've completed an experiment and wish to present your data graphically to support your conclusions regarding differences among groups at the 95% confidence level. Specifically, you decide to make a bar chart presenting the four treatments in your study, where the height of each bar is the mean of each treatment group; for example: Response (units) . 25.0 20.0 15.0 10.0 5.0 0.0 1 2 3 4 Treatment Levels Of the options below, which would be the most appropriate error bars to use in such a chart? a) b) c) d) ± 1 standard deviation ± 1 standard error ± 2 standard deviations ± 2 standard errors Explain why (1-2 sentences). Which of the four options above would give you the smallest error bars (and therefore make your results look the best)? 1.5 [2 points] A team of researchers is testing three new varieties of jasmine to see if they yield more aromatic molecules (%w/w) than the cosmetics industry standard. [Solvent extraction of the oil from flowers of the standard jasmine variety yields only about 0.2% aromatic molecules]. They divide their heterogeneous research field into four large plots and plant one row of each variety (the three test varieties and the standard) within each plot, arranging the four varieties within each plot at random. Knowing that the researchers are able to process (i.e. harvest, extract the oil, and characterize it via HPLC) only four rows simultaneously, that the %w/w of aromatic molecules falls rapidly once a flower is picked, and that the %w/w of aromatic molecules is highly dependent on the time of harvest, which of the following processing strategies would yield the most statistically-sensitive and accurate results? Explain why. A. B. C. D. Simultaneously process the four rows of the same variety. Simultaneously process the four varieties from the same plot. Simultaneously process the four varieties, choosing one from each plot. Simultaneously process four randomly-selected rows from the field, disregarding variety. 1.6 [2 points] If the following statement is true, state that it is true. If it is false, state that it is false and explain why it is false: "A blocking variable can increase the power of an experiment by accounting for variation that would otherwise contribute to the experimental error (MSE)." 2 Problem 2 [16 points] Scenario: Territorial aggression and defense are common themes in the animal kingdom, and being able to claim and defend a territory is often the only way to produce offspring. The males of many migratory songbird species arrive in their territories in early spring and vigorously defend these locations against all intruders. Song is known to be important in territory defense, with higher song rates indicating increased aggression and vigilance in the territory owner. Though songbirds cannot spend all their time singing (they have to eat too!), they must be ready to mount a strong defense when an intruder enters their territory. You have decided to conduct a pilot study to investigate what cues or signals Common Yellowthroat warblers use to identify territorial intrusions. To study this, you will present three experimental treatments to different individuals: 1. A stuffed model bird perched on a branch and placed in the territory 2. A broadcast of a recorded song from an unfamiliar male 3. Both a stuffed model and a song playback together Your plan is to count the number of songs given in the 15 minutes immediately following each introduced treatment and compare this number to a control. For the control, you will enter the territory as before but will not actually present a model or broadcast a song. Based on previous research with similar species, you suspect that territory size may have an impact on the strength of response (e.g. in smaller territories with less resources, males may be more willing to defend them). To control for this influence so that you may more accurately determine the effects of your treatments, you divide your twelve test subjects (male warblers) into three groups, based on the sizes of their territories (small, medium, and large); you then randomize the treatments within each of those categories [see data below]. Treatment Control Song Model Both 2.1 Small 25 29 30 33 Territory Size Medium 24 31 36 39 Large 28 35 33 38 [5 points] Describe in detail the design of this experiment [see appendix]. 2.2 [4 points] Verify that the data meet all assumptions of the ANOVA, provide a plot of residual vs. predicted values, and comment. 2.3 [4 point] interpretation. Present the complete ANOVA table and a box plot of treatment means. Provide 2.4 [5 points] Answer the following set of questions using the most sensitive test that controls MEER at 5%: a. Do broadcasted songs increase aggression in territory holders? b. Does the introduction of model birds increase aggression in territory holders? 3 c. Does the introduction of model birds affect the response of Common Yellowthroats to broadcasted songs? Based on your results, make a conclusion about how Common Yellowthroats recognize intruders. 2.5 [4 points] Does territory size have an effect on the defense response of Common Yellowthroats? Is there any evidence that the treatment effects differ depending on territory size? Keeping the results of this pilot study in mind when you perform a larger follow-up experiment, should treatments be randomized with territory size in mind? 4 Problem 3 [25 points] As part of its investment in agricultural research related to water-use efficiency, the E.U. recently funded a team of researchers to investigate the effect of deficit irrigation strategies on olive oil quality. An olive grower in Spain agreed to provide 1 ha of mature olive grove to the researchers for the experiment. To reduce heterogeneity within the farm, the researchers divided the 1 ha plot into four blocks of equal area. [Hint: get a piece of paper and start sketching!] Each block was then divided into four equal areas to which the four deficit irrigation strategies (see below) were randomly assigned. As news of the study spread, three other farmers (two in Italy and one in Greece) offered their lands as well, enabling the experiment to be replicated a total of four times. Independent randomizations were performed for each farm, as shown in the data table on the next page [Tip: Block 1 on one farm has no relationship to Block 1 on another farm]. The Treatments Control Strategy 1 Strategy 2 Strategy 3 Irrigation rate equal to 100% evapotranspiration (ET) rate Irrigation rate equal to 90% ET rate Irrigation rate equal to 80% ET rate Irrigation rate equal to 70% ET rate To control error as much as possible, the researchers decided to process all the olives at the same facility in central Italy. While this facility is state-of-the-art, its industrial-scale equipment is not designed for tightly-controlled experiments with small batches of olives. Specifically, it takes a full hour to clean all the equipment thoroughly between batches, meaning that 30 hours of continual work are required to process all the olives from a single farm. Fearing that the quality of the olive oil could be affected by the amount of time that elapses between harvest and processing, the researchers decided to: I. Process the sixteen batches from each farm as a group, one immediately after the other over an intense working period of 30 hours. [Note: The term "batch" means all the olives harvested from one treatment in a block.] Now, even though the researchers can clean out the system after each batch, it is not possible in terms of time or money to replace the expensive filters in the system that often. At best, they can replace them once every four batches. Fearing that the quality of the olive oil could be affected by different sets of filters as well as by the deteriorating condition of those filters over the course of four batches, the researchers decided to: II. Process the four batches from each block as a group, one immediately after the other, using the same filters. III. For each farm, shuffle the processing order of the treatments so that each treatment has one chance to be processed first, second, third, and fourth after a filter change. The data for the experiment are shown on the following page. The average titratable acidity, measured in ml of 0.1 M NaOH and based on 10 subsamples, was recorded for each batch as a measure of olive oil quality. 5 Block - FARM 1 Moura, Spain - FARM 2 Oliena, Italy - FARM 3 Volos, Greece - FARM 4 Bari, Italy 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 Control 1st 12.67 4th 7.37 2nd 6.07 3rd 19.23 3rd 15.95 2nd 15.22 4th 2.73 1st 11.45 2nd 11.81 3rd 7.43 1st 0.54 4th 13.33 4th 20.24 1st 13.23 3rd 15.95 2nd 20.20 Treatment 90% ET 80% ET 3rd 20.80 2nd 6.56 1st 6.78 3rd 9.65 4th 9.42 1st 11.94 2nd 1.95 4th 8.96 1st 4.13 4th 14.11 3rd 9.69 1st 11.37 2nd 2.35 3rd 0.81 4th 5.56 2nd 13.17 4th 1.62 1st 1.93 2nd 3.19 4th 7.32 3rd 5.30 2nd 10.29 1st 6.95 3rd 9.56 2nd 19.06 3rd 16.60 4th 17.11 2nd 20.10 1st 4.10 4th 22.81 3rd 8.59 1st 6.46 70% ET 4th 32.31 2nd 24.69 3rd 20.20 1st 19.54 2nd 21.76 4th 20.10 1st 21.79 3rd 17.72 3rd 27.75 1st 21.95 4th 6.64 2nd 12.19 1st 37.42 3rd 32.63 2nd 24.73 4th 19.06 In the above table, "1st" means the corresponding batch was the first to be processed after a filter change. "2nd" means it was the second to be processed, and so forth. 3.1 [5 points] Describe in detail the design of this experiment [see appendix]. 3.2 [4 points] Using a reduced model (i.e. a model that includes no block interactions), show that the data meet the assumptions of normality of residuals and homogeneity of variances among treatments. (You are not asked to test non-additivity here.) 3.3 [3 points] Perform the appropriate ANOVA for this experiment and present the full ANOVA table. Is there an effect of irrigation rate on olive oil quality? Answer question 3.4, assuming no other analysis will be performed on the data. 3.4 [4 points] Maintaining MEER at 5%, use the most sensitive method to characterize the functional relationship between %ET rate and titratable acidity, indicating which components of the relationship are significant (i.e. linear, quadratic, etc.). Given that lower acidity denotes higher quality olive oil, do the data suggest that the quality of olive oil can be improved through a deficit irrigation strategy? To help you interpret this relationship, present a boxplot (or scatterplot) of acidity versus %ET rate. Answer question 3.5, assuming no other analysis will be performed on the data. 3.5 [3 points] Using the most sensitive method while maintaining MEER at 5%, do any of the deficit irrigation strategies in the study significantly improve olive oil quality relative to the non-deficit strategy (i.e. 100% ET)? Of the rates under investigation here, what is the minimum %ET rate that can be applied without significantly reducing olive oil quality, relative to the non-deficit strategy? 6 3.6 [2 points] Assign a p-value to the following statement: "There is not sufficient evidence to reject the null hypothesis of homogeneous treatment effects across farms (p = xxx)." 3.7 [2 points] Based on the information given, provide a brief explanation of the rationale behind Points I (i.e. quickly processing all the batches from a farm over a 30-hour period) and II (i.e. processing all the batches from one block using the same filters) above. 3.8 [2 points] There is no true replication in this experiment (i.e. more than one experimental unit treated alike); so what is being used to estimate the experimental error? [To answer this question, consider the linear model you used in your analysis; what terms are missing?] 7 Problem 4 [14 points] Background (for the interested): In response to the criticism that growing crops commercially for biofuels necessarily threatens food security by competing for prime agricultural land, biofuels advocates point to a plant like Jatropha curcas. Long grown in sub-Saharan Africa as a hedgerow, J. curcas also produces seeds high in oil content; and once extracted, this oil can be used as feedstock for biodiesel production. Proponents of J. curcas claim its production will not threaten food security since the plant can grow in marginal areas with low soil fertility and long dry periods. These are the claims; but as of now, little formal research has been carried out on J. curcas, an undomesticated species with notoriously variable yields. You have been asked by the International Biofuels Journal to review a recently-submitted study on the oil yield of Jatropha curcas grown on lands of various agricultural qualities. The manuscript contains the following description of the experiment: "Using soil maps, agricultural production maps, and weather maps, we identified two separate sites in each of the five following land categories in Malawi: 1. 3. 5. Prime irrigated agricultural (PIA) Moderately suitable agricultural (MSA) Nonarable (NA) 2. 4. Prime rainfed agricultural (PRA) Poor agricultural (PA) We established 200 J. curcas trees on each site and maintained them for a period of three years. In the third year, we randomly harvested 50 trees from each site. The seeds from each tree were then processed and the individual oil yields for each tree measured." In their conclusions, the authors state: "The ranked mean oil yields (kg/tree), by land type, are shown below: Land Type PIA PRA MSA NA PA kg Oil/Tree 2.420 1.305 0.520 0.375 0.335 At a confidence level of 95%, however, we found no significant differences in oil yield per tree among the sites. This finding supports the claim that J. curcas can be grown commercially on marginal lands and, as such, is a suitable species for investment by the biofuels industry." Answer the following questions, knowing that the estimated components of variance in the study are as follows: Variance among land types........................... 0.55913 Variance among sites within a given land type....... 0.46587 Variance among trees within a given site............ 0.17174 4.1 [8 points] What was the power of this study? Was it enough to justify the authors' conclusion? 4.2 [6 points] What is the level of replication in this study? Assuming that 50 trees is the optimum number to harvest from each site, how many sites would you test for each land type if you were to repeat the experiment? 8 Appendix When you are asked to "describe in detail the design of this experiment," please do so by completing the following template: Design: Response Variable: Experimental Unit: Class Variable 1 2 ↓ n Block or Treatment No. of Levels Subsamples? YES / NO Description For Example, the correct table for Problem 2 from HW 4 (Topics 6-7) is: Design: Response Variable: Experimental Unit: Class Variable 1 2 3 4 Block or Treatment Block Block Block Treatment Subsamples? 4x4 Latin Square, replicated 3 times, with independent rows (days) and shared columns (farms) PM-10 concentrations in the air (ug/m3) Plot within a farm No. of Levels 3 4 12 4 NO Description Season Farm Day (4 levels per season; 12 total in the experiment) Plow design (If you put "YES" in the cell to the left, you would describe the subsamples here; for this particular experiment, you could've said "YES" here, because technically each plot emission number given to you is the mean value over multiple traps.) 9