Advanced Plant Breeding PBG 650 Take-Home Final Exam, Fall 2015 Due 9:30 am on Friday, December 11, 2015 Name KEY Part 1 – AMMI analysis of Genotype by Environment interaction using R The Barley Project at OSU strives to develop barley cultivars that are both high yielding and that provide outstanding flavor in beer. A trial of 34 promising new experimental varieties and 3 standard checks (Golden Promise, BCD47, and Copeland) was conducted at three locations in Oregon in 2015. Yield trials were conducted in Corvallis, Lebanon, and Madras, OR. Each experiment was arranged as a Randomized Complete Block Design with two replications. While quality testing remains to be done, they need your help to determine how well these varieties are adapted to diverse environments in the target production regions of Oregon. The data set is available in the file OregonPromise2015.csv. As an initial step, they calculated the mean and CV% for each entry (i.e., variety) across all of the blocks and testing sites. The horizontal axis is bisected by the mean yield across all varieties, and the vertical axis is bisected by the mean CV% for all varieties. 1 1) Briefly discuss the concept of stability and the various ways that it can be defined and achieved. What does the graph above tell us (if anything) about the stability of these varieties? Use examples from the graph to illustrate your points. Which varieties are the 5 points best (in your opinion)? There are several meanings of the term stability. The oldest and simplest meaning is that the performance of genotypes do not change across environments. This is referred to as Type I or static stability. A more common definition is called Type II or dynamic stability. This implies that a genotype has the ability to respond to favorable environments in a consistent manner (they exhibit phenotypic plasticity). When genotype performance is regressed on an environmental index, stable genotypes have a linear regression coefficient of b = 1. Type III stability implies that a genotype’s observed performance is close to its regression line i.e., the deviations from the regression line are small. The graph above of the CV% vs the mean provides some insights about stability in the static sense, but doesn’t tell us much about phenotypic plasticity. Nonetheless, a variety that falls in the lower right hand quadrant and has both a high mean and relatively low CV% would be desirable. varieties 29 (120661), 36 (BCD47), and 20 (120384) appear to be good, stable varieties. Variety 14 (120341) has very high average yield and a CV that is just a little greater than the average. Additional analyses are needed to determine if this variety has good stability in the dynamic sense. Variety 10 (120314) has a particularly high CV and average yield. Additional analyses may reveal that this variety has good specific adaptation to one or more sites. The graph of the CV% vs the mean provides very little information about the adaptation of these varieties. 2) A stability analysis (similar to the Eberhart and Russel model) was conducted to look at the regression of each variety across an environmental index, and the following results were obtained: Source DF Location Block(Location) Entry Location*Entry Heterogeneity of slopes Dev from Linear Reg Residual 2 3 36 72 (36) (36) 108 Type I SS 426578200 10621586.5 35572607.5 53530235.6 19349229.9 34181005.7 27651886 2 Mean Square 213289100 3540528.8 988128 743475.5 537478.6 949472.4 256036 F Value 60.24 13.83 3.86 2.9 2.1 3.71 Pr > F 0.0038 <.0001 <.0001 <.0001 0.0018 <.0001 5 points Is there evidence for genotype by environment interactions? How can you tell? What do these results tell us about the nature of those interactions? We can tell that there are significant genotype by environment interactions, because the P value for Location*Entry is much less than 0.05. We can also tell that the varieties have different linear responses across the environmental index, because the term for heterogeneity of slopes is significant. However, there is still a substantial amount of GXE that is not explained by the linear regression, as shown by the significant term for deviations from regression. This analysis assesses the dynamic response of genotypes to changing environments (Type II Stability) as well as Type III stability (deviations from regression). 5 points 3) Refer to Table 1 to identify several varieties that you think show a good level of stability. Justify your choice of varieties. Answers will vary. I have gone back to the original data to include some discussion of Type III stability, but this was not expected because you were only given the means across sites and the linear regression coefficients (Type II stability). I have also taken average yield into consideration. Plot of the six varieties with the highest average yields 3 Table 1. Stability analysis of barley genotypes Entry Genotype Mean Yield lb/acre linear regression coefficient 1 DH120058 5910 1.25 2 DH120145 6178 1.01 3 DH120166 5879 0.98 4 DH120020 5477 1.07 5 DH120031 5338 1.13 6 DH120089 5273 1.10 7 DH120090 5595 1.16 8 DH120156 6064 0.85 9 DH120285 5778 1.34 10 120314 6082 1.60 11 120329 6030 0.62 12 120330 5244 0.84 13 120331 5823 1.14 14 120341 6733 1.20 15 120363 6146 0.61 16 120365 6048 1.03 17 120366 6064 1.28 18 120374 6293 0.96 19 120381 6459 1.11 20 120384 6489 0.66 21 120510 5970 0.89 22 120516 6059 0.88 23 120520 6086 0.87 24 120521 5597 0.88 25 120529 5673 1.08 26 120536 6142 0.93 27 120543 5323 0.67 28 120657 6428 1.27 29 120661 6796 0.85 30 120671 5887 1.06 31 120691 5514 1.14 32 120709 5729 0.85 33 120715 5666 0.85 34 6275 1.09 6380 0.66 36 120731 Golden Promise BCD47 6633 1.04 37 Copeland 6294 1.05 35 Variety 14 has high average yield but would not be considered stable because it has a b > 1 and large deviations from regression. Variety 19 has high average yield, a b value slightly larger than one, and small deviations from regression. It is fairly stable. Variety 20 has high average yield and a small b value. From the graph we can see that it was one of the best varieties in Corvallis and Lebanon, but was below average in Madras. Variety 28 has high average yield, but a large b value. It does not have good Type II stability, but we can see from the graph that the observed values fall along the regression line, so it has good Type III stability. It was poor in Corvallis but performed well at the better sites. Variety 29 had the highest average yield in the trial and a b value that is a little less than one. From the graph we can see that it was one of the best varieties in Corvallis and Lebanon. It was above average, but not exceptional, in Madras. I would call this a stable variety. Variety 36 is the check variety BCD47. It has high average yield, a b value of 1.04, and small deviations from regression. It is a good, stable variety. 4 5 points 4) What are some of the limitations of the Eberhart and Russel stability analysis? The main limitation, in my opinion, is that there are many different environmental factors contributing to the overall mean yield at a site, so there is no reason to expect that a genotype will show a linear response across the sites. In a sense it discriminates against unusual genotypes, rather than helping us to identify genotypes with unique characteristics. Studies have also shown that Type II and Type III stability have low heritabilities, so selecting for stability alone may not achieve the desired goal. In this particular example, we have two good sites and one poor site, so our estimates of the slope may be unduly influenced by the data that was collected in Corvallis (the low yielding site). The rank changes that are seen between varieties at the two good sites also suggests that the information gained from linear regression may be limited. 5 points 5) Describe the components of the model for an AMMI analysis. What are some of the benefits of AMMI for investigating GXE interactions? Yijl = + Gi + Ej + (kikjk) + dij + eijl k = kth eigenvalue ik = principal component score for the ith genotype for the kth principal component axis jk = principal component score for the jth environment for the kth principal component axis dij = residual GXE not explained by model AMMI combines a conventional linear model for genotypes and environments with a principal component analysis of the GXE interactions. AMMI biplots can help to visualize how genotypes are adapted to the environments in the study, and to identify stable genotypes that have broad adaptation. There are some other potential benefits of AMMI that may be useful in particular situations, although they do not apply specifically to the barley data set. When there are many testing sites, AMMI predictions can help to focus attention on the most important and predictable aspects of GXE, and reduce background noise. That was not the case with this data set, because we had only 3 environments with 2 degrees of freedom, so the first two principal components explained all of the observed variation in GXE. AMMI can also be used to rationalize the choice of testing sites and identify the minimum number of sites that are needed to represent the target production environments in a breeding program. It can also assist in determining the key agroclimatic factors, disease and insect pests, and physiological traits that determine adaptation to environments. 5 15 points 6) Use the R program at the end of this section to run an AMMI analysis. Interpret the output using guidelines from lecture and from the article by Malosetti et al. (2013). Cut and paste a copy of the PC1*PC2 biplot with your discussion (no other output needs to be submitted). What recommendations would you make to the Barley Project about the adaptation of their experimental varieties to diverse environments in the target production regions of Oregon? Use examples to illustrate the principles involved (one or two paragraphs or about 200 words is sufficient.) A summary of observations on the most promising genotypes is included at the end of this exam, so here we will just discuss a few examples to illustrate how the biplot is interpreted. It’s important to remember that this biplot only represents the variation due to GXE. It is essential to consider the main effects of genotypes as well to identify varieties that are both stable and high yielding. The plot of PCA1 vs yield that was produced from the analysis we did in R could help in this regard, provided that the genotypes are correctly labeled on the graph (it is necessary to use the number=FALSE option to avoid renumbering the entries in alphanumeric order, where 10 comes before 2, etc.). Adding an extra line of code to the R program to get the specific deviations due to GXE in each environment as predicted by AMMI (model$genXenv) may also help with interpretation. The environments themselves are quite diverse, as shown by the dispersion of their scores in the biplot and the fact that the first PCA explained only 68.7% of the GXE variation. The angles of the three vectors indicate that the correlation of genotype yields in Lebanon and Corvallis was low (r = 0.39), but was slightly greater than between either of the Willamette Valley sites and Madras (r ≈ 0). (Note that the angles can be affected by the scale on the two axis – they should be the same.) Entry 29 (120661) had the highest average yield across sites and performed well in all environments. It has broad adaptation and clusters towards the center of the graph. It falls in the same quadrant as the COR vector, because it showed the greatest superiority in Corvallis. A line from the origin to Entry 29 would form an acute angle with the LEB vector, which implies that Entry 29 has a 6 positive GXE effect in Lebanon. It forms an obtuse angle with the MAD vector, indicating a negative GXE effect in Madras. However, the GXE effects for each genotype must sum to zero, so we need to check the mean yields at each site as well. Entry 29 had above average yield at all sites. I would call it a stable variety. Entry 14 (120341) was outstanding in Madras, and falls close to the MAD axis. It had negative GXE interactions in LEB and COR, but was nonetheless above average at all sites. Entry 2 (DH120145) was the highest yielding variety in Lebanon. It has specific adaptation to that environment, but performed well below average in Madras. It cannot be called stable (in my opinion), but it might be a good variety for the Lebanon area. Underlying causes for its specific adaptation could be investigated. #Analysis of barley MET trial #Read in the table from a csv file promise<-read.table("C:/Users/klingj/Desktop/OregonPromise2015.csv", header=TRUE, sep=",") #several options for reviewing data structure before the analysis head(promise) #headings and first six lines of the data #convert integers to factors promise$genof <- as.factor(promise$entry) promise$envf <- as.factor(promise$Location) promise$blockf <- as.factor(promise$Block) promise <- subset(promise, select = c(envf, genof, blockf, Yield)) str(promise) #compactly displays structure of the dataset #plot the data library(ggplot2) qplot(genof, Yield, data=promise, geom="boxplot") library(agricolae) attach(promise) model<- AMMI(envf, genof, blockf, Yield, console=TRUE, PC=TRUE) detach(promise) #print out scaled PCA scores (note alphanumeric sorting) model$biplot # biplot PC1 vs average Yield # number=false turns off renumbering of entries in sorted order plot(model, first=0,second=1, number=FALSE) # biplot of PC2 vs PC1 plot(model) # get AMMI predicted GXE effects model$genXenv 7 Summary of yield and GXE effects for the best lines in the Oregon Promise Trial MEANS GXE ID Name COR DEV LEB DEV MAD DEV Mean DEV TOT COR LEB MAD Description DH120145 2 4038 -0.02 7949 1.77 6546 -1.06 6178 0.48 1.18 -205 1076 -871 Good specific adaptation in Lebanon; poor in Madras 10 120314 3033 -1.79 6888 0.29 8323 1.73 6081 0.24 0.47 -1113 111 1002 Good specific adaptation in Madras; poor in Corvallis 14 120341 4521 0.84 6876 0.28 8800 2.48 6733 1.85 5.44 -277 -551 828 Good specific adaptation in MAD; above avg in COR and LEB 15 120363 4868 1.45 7099 0.59 6471 -1.18 6146 0.40 1.26 657 258 -915 Good specific adaptation in COR; fair in LEB; poor in MAD 18 120374 4485 0.77 6723 0.06 7671 0.71 6293 0.76 2.31 127 -266 138 Broad adaptation, but not outstanding 19 120381 4307 0.46 7286 0.85 7784 0.88 6459 1.17 3.36 -217 132 86 Broad adaptation, yields consistently above average 20 120384 5102 1.87 7579 1.26 6786 -0.68 6489 1.25 3.69 548 395 -942 Very good in COR and LEB; below average in MAD 22 120516 4549 0.89 5543 -1.58 8086 1.36 6059 0.19 0.85 425 -1212 787 Good in COR and MAD; poor in LEB 26 120536 4389 0.60 6467 -0.29 7570 0.55 6142 0.39 1.25 182 -371 188 Broad adaptation, but not outstanding; average in LEB 28 120657 3957 -0.16 7389 0.99 7939 1.13 6428 1.10 3.06 -537 266 271 Good in MAD and LEB; below average in COR 29 120661 5106 1.87 7700 1.43 7581 0.56 6796 2.00 5.87 246 209 -454 Broad adaptation; best in COR, good in LEB, above avg in MAD 34 120731 4084 0.06 7473 1.11 7267 0.07 6275 0.72 1.97 -256 503 -247 Broad adaptation, but not outstanding 35 Golden Prom. 5052 1.78 7135 0.64 6953 -0.42 6380 0.98 2.97 606 60 -667 Good in COR; not bad in LEB and MAD BCD47 36 4662 1.09 7130 0.63 8108 1.39 6633 1.60 4.71 -36 -199 235 Broad adaptation; consistently above average 37 Copeland 4171 0.22 7509 1.16 7201 -0.03 6294 0.77 2.11 -187 520 -333 Good specific adaptation in LEB; avg in COR and MAD Average 4047 0.00 6678 0.00 7222 0.00 5982 0.00 0.00 stdev 565 1.00 716 1.00 636 1.00 406 1.00 2.90 1