Advanced Plant Breeding PBG 650 Name Take-Home Final Exam, Fall 2015

advertisement
Advanced Plant Breeding PBG 650
Take-Home Final Exam, Fall 2015
Due 9:30 am on Friday, December 11, 2015
Name
KEY
Part 1 – AMMI analysis of Genotype by Environment interaction using R
The Barley Project at OSU strives to develop barley cultivars that are both high yielding and that
provide outstanding flavor in beer. A trial of 34 promising new experimental varieties and 3
standard checks (Golden Promise, BCD47, and Copeland) was conducted at three locations in
Oregon in 2015. Yield trials were conducted in Corvallis, Lebanon, and Madras, OR. Each
experiment was arranged as a Randomized Complete Block Design with two replications. While
quality testing remains to be done, they need your help to determine how well these varieties
are adapted to diverse environments in the target production regions of Oregon.
The data set is available in the file OregonPromise2015.csv.
As an initial step, they calculated the mean and CV% for each entry (i.e., variety) across all of
the blocks and testing sites. The horizontal axis is bisected by the mean yield across all varieties,
and the vertical axis is bisected by the mean CV% for all varieties.
1
1) Briefly discuss the concept of stability and the various ways that it can be defined and
achieved. What does the graph above tell us (if anything) about the stability of these
varieties? Use examples from the graph to illustrate your points. Which varieties are the
5 points
best (in your opinion)?
There are several meanings of the term stability. The oldest and simplest meaning is that
the performance of genotypes do not change across environments. This is referred to as
Type I or static stability. A more common definition is called Type II or dynamic stability.
This implies that a genotype has the ability to respond to favorable environments in a
consistent manner (they exhibit phenotypic plasticity). When genotype performance is
regressed on an environmental index, stable genotypes have a linear regression coefficient
of b = 1. Type III stability implies that a genotype’s observed performance is close to its
regression line i.e., the deviations from the regression line are small.
The graph above of the CV% vs the mean provides some insights about stability in the static
sense, but doesn’t tell us much about phenotypic plasticity. Nonetheless, a variety that falls
in the lower right hand quadrant and has both a high mean and relatively low CV% would be
desirable. varieties 29 (120661), 36 (BCD47), and 20 (120384) appear to be good, stable
varieties. Variety 14 (120341) has very high average yield and a CV that is just a little greater
than the average. Additional analyses are needed to determine if this variety has good
stability in the dynamic sense. Variety 10 (120314) has a particularly high CV and average
yield. Additional analyses may reveal that this variety has good specific adaptation to one or
more sites. The graph of the CV% vs the mean provides very little information about the
adaptation of these varieties.
2) A stability analysis (similar to the Eberhart and Russel model) was conducted to look at the
regression of each variety across an environmental index, and the following results were
obtained:
Source
DF
Location
Block(Location)
Entry
Location*Entry
Heterogeneity of slopes
Dev from Linear Reg
Residual
2
3
36
72
(36)
(36)
108
Type I SS
426578200
10621586.5
35572607.5
53530235.6
19349229.9
34181005.7
27651886
2
Mean Square
213289100
3540528.8
988128
743475.5
537478.6
949472.4
256036
F Value
60.24
13.83
3.86
2.9
2.1
3.71
Pr > F
0.0038
<.0001
<.0001
<.0001
0.0018
<.0001
5 points
Is there evidence for genotype by environment interactions? How can you tell? What do
these results tell us about the nature of those interactions?
We can tell that there are significant genotype by environment interactions, because the P
value for Location*Entry is much less than 0.05. We can also tell that the varieties have
different linear responses across the environmental index, because the term for
heterogeneity of slopes is significant. However, there is still a substantial amount of GXE
that is not explained by the linear regression, as shown by the significant term for
deviations from regression. This analysis assesses the dynamic response of genotypes to
changing environments (Type II Stability) as well as Type III stability (deviations from
regression).
5 points
3) Refer to Table 1 to identify several varieties that you think show a good level of stability.
Justify your choice of varieties.
Answers will vary. I have gone back to the original data to include some discussion of Type
III stability, but this was not expected because you were only given the means across sites
and the linear regression coefficients (Type II stability). I have also taken average yield into
consideration.
Plot of the six varieties with the highest average yields
3
Table 1. Stability analysis of barley genotypes
Entry
Genotype
Mean
Yield
lb/acre
linear
regression
coefficient
1
DH120058
5910
1.25
2
DH120145
6178
1.01
3
DH120166
5879
0.98
4
DH120020
5477
1.07
5
DH120031
5338
1.13
6
DH120089
5273
1.10
7
DH120090
5595
1.16
8
DH120156
6064
0.85
9
DH120285
5778
1.34
10
120314
6082
1.60
11
120329
6030
0.62
12
120330
5244
0.84
13
120331
5823
1.14
14
120341
6733
1.20
15
120363
6146
0.61
16
120365
6048
1.03
17
120366
6064
1.28
18
120374
6293
0.96
19
120381
6459
1.11
20
120384
6489
0.66
21
120510
5970
0.89
22
120516
6059
0.88
23
120520
6086
0.87
24
120521
5597
0.88
25
120529
5673
1.08
26
120536
6142
0.93
27
120543
5323
0.67
28
120657
6428
1.27
29
120661
6796
0.85
30
120671
5887
1.06
31
120691
5514
1.14
32
120709
5729
0.85
33
120715
5666
0.85
34
6275
1.09
6380
0.66
36
120731
Golden
Promise
BCD47
6633
1.04
37
Copeland
6294
1.05
35
Variety 14 has high average yield but would not
be considered stable because it has a b > 1 and
large deviations from regression.
Variety 19 has high average yield, a b value
slightly larger than one, and small deviations
from regression. It is fairly stable.
Variety 20 has high average yield and a small b
value. From the graph we can see that it was
one of the best varieties in Corvallis and
Lebanon, but was below average in Madras.
Variety 28 has high average yield, but a large b
value. It does not have good Type II stability,
but we can see from the graph that the
observed values fall along the regression line,
so it has good Type III stability. It was poor in
Corvallis but performed well at the better sites.
Variety 29 had the highest average yield in the
trial and a b value that is a little less than one.
From the graph we can see that it was one of
the best varieties in Corvallis and Lebanon. It
was above average, but not exceptional, in
Madras. I would call this a stable variety.
Variety 36 is the check variety BCD47. It has
high average yield, a b value of 1.04, and small
deviations from regression. It is a good, stable
variety.
4
5 points
4) What are some of the limitations of the Eberhart and Russel stability analysis?
The main limitation, in my opinion, is that there are many different environmental factors
contributing to the overall mean yield at a site, so there is no reason to expect that a
genotype will show a linear response across the sites. In a sense it discriminates against
unusual genotypes, rather than helping us to identify genotypes with unique characteristics.
Studies have also shown that Type II and Type III stability have low heritabilities, so selecting
for stability alone may not achieve the desired goal. In this particular example, we have two
good sites and one poor site, so our estimates of the slope may be unduly influenced by the
data that was collected in Corvallis (the low yielding site). The rank changes that are seen
between varieties at the two good sites also suggests that the information gained from
linear regression may be limited.
5 points
5) Describe the components of the model for an AMMI analysis. What are some of the
benefits of AMMI for investigating GXE interactions?
Yijl =  + Gi + Ej + (kikjk) + dij + eijl
k = kth eigenvalue
ik = principal component score for the ith genotype for the kth principal component axis
jk = principal component score for the jth environment for the kth principal component axis
dij = residual GXE not explained by model
AMMI combines a conventional linear model for genotypes and environments with a
principal component analysis of the GXE interactions. AMMI biplots can help to visualize
how genotypes are adapted to the environments in the study, and to identify stable
genotypes that have broad adaptation.
There are some other potential benefits of AMMI that may be useful in particular situations,
although they do not apply specifically to the barley data set. When there are many testing
sites, AMMI predictions can help to focus attention on the most important and predictable
aspects of GXE, and reduce background noise. That was not the case with this data set,
because we had only 3 environments with 2 degrees of freedom, so the first two principal
components explained all of the observed variation in GXE. AMMI can also be used to
rationalize the choice of testing sites and identify the minimum number of sites that are
needed to represent the target production environments in a breeding program. It can also
assist in determining the key agroclimatic factors, disease and insect pests, and
physiological traits that determine adaptation to environments.
5
15 points
6) Use the R program at the end of this section to run an AMMI analysis. Interpret the output
using guidelines from lecture and from the article by Malosetti et al. (2013). Cut and paste a
copy of the PC1*PC2 biplot with your discussion (no other output needs to be submitted).
What recommendations would you make to the Barley Project about the adaptation of their
experimental varieties to diverse environments in the target production regions of Oregon?
Use examples to illustrate the principles involved (one or two paragraphs or about 200
words is sufficient.)
A summary of observations on the most promising genotypes is included at the end of this exam, so
here we will just discuss a few examples to illustrate how the biplot is interpreted. It’s important to
remember that this biplot only represents the variation due to GXE. It is essential to consider the main
effects of genotypes as well to identify varieties that are both stable and high yielding. The plot of PCA1
vs yield that was produced from the analysis we did in R could help in this regard, provided that the
genotypes are correctly labeled on the graph (it is necessary to use the number=FALSE option to avoid
renumbering the entries in alphanumeric order, where 10 comes before 2, etc.). Adding an extra line of
code to the R program to get the specific deviations due to GXE in each environment as predicted by
AMMI (model$genXenv) may also help with interpretation.
The environments themselves are quite diverse, as shown by the dispersion of their scores in the
biplot and the fact that the first PCA explained only 68.7% of the GXE variation. The angles of the three
vectors indicate that the correlation of genotype yields in Lebanon and Corvallis was low (r = 0.39), but
was slightly greater than between either of the Willamette Valley sites and Madras (r ≈ 0). (Note that
the angles can be affected by the scale on the two axis – they should be the same.)
Entry 29 (120661) had the highest average yield across sites and performed well in all
environments. It has broad adaptation and clusters towards the center of the graph. It falls in the same
quadrant as the COR vector, because it showed the greatest superiority in Corvallis. A line from the
origin to Entry 29 would form an acute angle with the LEB vector, which implies that Entry 29 has a
6
positive GXE effect in Lebanon. It forms an obtuse angle with the MAD vector, indicating a negative GXE
effect in Madras. However, the GXE effects for each genotype must sum to zero, so we need to check
the mean yields at each site as well. Entry 29 had above average yield at all sites. I would call it a stable
variety. Entry 14 (120341) was outstanding in Madras, and falls close to the MAD axis. It had negative
GXE interactions in LEB and COR, but was nonetheless above average at all sites. Entry 2 (DH120145)
was the highest yielding variety in Lebanon. It has specific adaptation to that environment, but
performed well below average in Madras. It cannot be called stable (in my opinion), but it might be a
good variety for the Lebanon area. Underlying causes for its specific adaptation could be investigated.
#Analysis of barley MET trial
#Read in the table from a csv file
promise<-read.table("C:/Users/klingj/Desktop/OregonPromise2015.csv",
header=TRUE, sep=",")
#several options for reviewing data structure before the analysis
head(promise) #headings and first six lines of the data
#convert integers to factors
promise$genof <- as.factor(promise$entry)
promise$envf <- as.factor(promise$Location)
promise$blockf <- as.factor(promise$Block)
promise <- subset(promise, select = c(envf, genof, blockf, Yield))
str(promise)
#compactly displays structure of the dataset
#plot the data
library(ggplot2)
qplot(genof, Yield, data=promise, geom="boxplot")
library(agricolae)
attach(promise)
model<- AMMI(envf, genof, blockf, Yield, console=TRUE, PC=TRUE)
detach(promise)
#print out scaled PCA scores (note alphanumeric sorting)
model$biplot
# biplot PC1 vs average Yield
# number=false turns off renumbering of entries in sorted order
plot(model, first=0,second=1, number=FALSE)
# biplot of PC2 vs PC1
plot(model)
# get AMMI predicted GXE effects
model$genXenv
7
Summary of yield and GXE effects for the best lines in the Oregon Promise Trial
MEANS
GXE
ID
Name
COR DEV LEB DEV MAD DEV Mean DEV TOT COR LEB MAD
Description
DH120145
2
4038 -0.02 7949 1.77 6546 -1.06 6178 0.48 1.18 -205 1076 -871 Good specific adaptation in Lebanon; poor in Madras
10 120314 3033 -1.79 6888 0.29 8323 1.73 6081 0.24 0.47 -1113 111 1002 Good specific adaptation in Madras; poor in Corvallis
14 120341 4521 0.84 6876 0.28 8800 2.48 6733 1.85 5.44 -277 -551 828 Good specific adaptation in MAD; above avg in COR and LEB
15 120363 4868 1.45 7099 0.59 6471 -1.18 6146 0.40 1.26 657 258 -915 Good specific adaptation in COR; fair in LEB; poor in MAD
18 120374 4485 0.77 6723 0.06 7671 0.71 6293 0.76 2.31 127 -266 138 Broad adaptation, but not outstanding
19 120381 4307 0.46 7286 0.85 7784 0.88 6459 1.17 3.36 -217 132
86 Broad adaptation, yields consistently above average
20 120384 5102 1.87 7579 1.26 6786 -0.68 6489 1.25 3.69 548 395 -942 Very good in COR and LEB; below average in MAD
22 120516 4549 0.89 5543 -1.58 8086 1.36 6059 0.19 0.85 425 -1212 787 Good in COR and MAD; poor in LEB
26 120536 4389 0.60 6467 -0.29 7570 0.55 6142 0.39 1.25 182 -371 188 Broad adaptation, but not outstanding; average in LEB
28 120657 3957 -0.16 7389 0.99 7939 1.13 6428 1.10 3.06 -537 266 271 Good in MAD and LEB; below average in COR
29 120661 5106 1.87 7700 1.43 7581 0.56 6796 2.00 5.87 246 209 -454 Broad adaptation; best in COR, good in LEB, above avg in MAD
34 120731 4084 0.06 7473 1.11 7267 0.07 6275 0.72 1.97 -256 503 -247 Broad adaptation, but not outstanding
35 Golden Prom. 5052 1.78 7135 0.64 6953 -0.42 6380 0.98 2.97 606
60 -667 Good in COR; not bad in LEB and MAD
BCD47
36
4662 1.09 7130 0.63 8108 1.39 6633 1.60 4.71
-36 -199 235 Broad adaptation; consistently above average
37 Copeland 4171 0.22 7509 1.16 7201 -0.03 6294 0.77 2.11 -187 520 -333 Good specific adaptation in LEB; avg in COR and MAD
Average 4047 0.00 6678 0.00 7222 0.00 5982 0.00 0.00
stdev 565 1.00 716 1.00 636 1.00 406 1.00 2.90
1
Download