Final 2009 key - Crop and Soil Science

advertisement
Advanced Plant Breeding CSS 650
Take-Home Final Exam, Fall 2009
Due Wednesday, Dec. 9, 2009
Name
KEY
Show your work! This is essentially a final homework assignment. You can refer to your notes,
journal articles, and text books. Do as much as you can on your own, but you may compare
answers with your classmates if you wish.
1) The eggplant breeder at a private seed company has just retired and you are hired to take over
the program. You are given a set of 20 new dihaploid lines and wish to know their potential
for use as parents in hybrids. As we learned from the poster session in class, hybrid seed
production is currently done by hand pollination in eggplant. What mating designs could be
used to estimate combining ability (hint: we discussed two in class and another is commonly
used in hybrid corn breeding)? List any questions you would ask the retiree that would help
you to choose the best design. For the purposes of this exam, assume specific responses to
those questions and explain your consequent choice of a mating design. Draw a diagram
showing how you would make the crosses.
Possible mating designs include a diallel, a factorial mating design (NC Design II), or a testcross
analysis. A Design I could potentially be used to estimate combining ability, but you would
need to identify additional lines or a population to use as females.
Possible questions for the breeder (answers will vary)
1) Is there much heterosis in eggplant? How important is general vs specific combining ability in
eggplant? Assuming there is heterosis, we will need a design that can estimate both general
and specific combining ability
2) Is there currently a distinction between male (pollen producers) and female parents in hybrid
seed production? Are there important maternal effects? I’ll assume that the answers to these
questions are no. Any line can potentially be a male or female and we won’t need to make
reciprocal crosses to estimate maternal effects.
3) Are there known heterotic groups in eggplant and what do we know about the heterotic pattern
of this particular set of dihaploids? I will assume that there is some knowledge of heterotic
patterns and that crosses were made within heterotic groups.
4) Were the dihaploid lines derived from a common cross or did they have different parents? To
what extent do they represent the available genetic diversity among eggplant cultivars? If I
assume that they came from a single cross and that there are two other important heterotic
groups in eggplant, then I could choose inbred tester lines from the other two groups and
make all of the 20 x 2 possible testcrosses. If I assume that half of the lines came from one
heterotic group and the other half from another, a factorial mating scheme between the two
sets of lines would be a good option.
If there is no knowledge of heterotic groups and the lines represent a wide range of crosses
among parents of diverse origins, I may want to consider a diallel mating scheme. If making
all possible 190 crosses is prohibitive I could use some form of partial diallel.
1
2) A breeder wants to improve pearl millet as a grain and forage crop in North Dakota. In 2006
he evaluated 300 half-sib families from a breeding population in a yield trial at a single
location with 3 replications (blocks). To assist in developing a selection index, he decided to
estimate heritabilities and the genetic correlation (rA) between grain yield and plant height.
He used the following SAS code to generate univariate analyses for both traits as well as an
analysis of covariance for the two traits.
proc glm;
class Block Family;
model Yield Height =Block Family;
manova h=Family/printh printe;
random Block Family/test;
Run;
The GLM Procedure
Multivariate Analysis of Variance
H = Type III SSCP Matrix for Family
Yield
Yield
Height
161.46
1255.8
Height
1255.8
83241.6
Output from printh
E = Error SSCP Matrix
Yield
Yield
Height
179.4
1315.6
Height
1315.6
65780.0
Output from printe
He summarized the ANOVA for yield as follows:
Source
Block
Family
Error
df
SS
MS
F
Prob>F
2
299
598
161.46
179.40
0.54
0.30
1.8
0.0000
From these results he calculated the phenotypic variance among half-sib families to be 0.18 and
obtained an estimate of heritability for yield on a family mean basis = 0.44.
2
2) cont’d. Use the SAS output above to answer the following questions (show your work)
a) Calculate phenotypic variance among half-sib families for plant height and estimate
heritability on a family mean basis for this trait.
MSF = SSF/299 = 83241.6/299 = 278.4
MSE = SSE/598 = 65780/598 = 110
P2  F2  2X  F2 
E2 MSF 278.4


 92.8
3
3
3
F2  (MSF  MSE ) / r  (278.4  110) / 3  56.13
h2 
F2 56.13

 0.60
P2
92.8
b) What is the additive genetic correlation (rA) for yield and plant height? (you will need to
calculate the genetic covariance for these traits in the same manner that you calculated
the genetic variance among half sib families for individual traits)
MCPF = SCPF/229 = 1255.8/299 = 4.2
MCPE = SCPE/598 = 1315.6/598 = 2.2
CovHSXY  (MCPF  MCPE ) / r  (4.2  2.2) / 3  0.6667
rA 
CovHSXY
Cov A
0.6667


 0.3146
A X A Y HSX HSY
0.08 * 56.1333
(note that CovHSXY = ¼CovA, but coefficients in the numerator and denominator cancel
out)
You could also calculate the phenotypic correlation among family means (but this was
not requested)
rP 
MCPF
4.2

 0.3425
MSFXMSFY
0.54 * 278.4
c) If he selected the best 15 families for grain yield, what would be the expected response to
selection?
R X  ihX  A X  ih2X PX  2.06 * 0.4444 * 0.18  0.388 t/ha
3
He decided to validate these estimates by making separate selections for yield and plant
height. He selected the best 15 families for each trait:
Mean of all families
Mean of 15 selected families
Grain Yield t/ha
3.5
4.3
Plant Height cm
170
190
In 2007 he intermated remnant seed of selected families to form the C1 cycles of selection.
In 2008 he evaluated the selection response:
Selection for yield
t/ha
cm
Cycle 0
Cycle 1
3.8
4.1
176
180
Selection for plant height
cm
t/ha
176
188
3.8
3.9
d) Calculate the realized heritability for yield.
h2 
R 4.1  3.8 0.3


 0.375
S 4.3  3.5 0.8
e) Use the relationship below to obtain an estimate of rA from the selection experiment.
 CR X  CRY 
rA2  


 R X  RY 
 3.9  3.8  180  176 
 0.1  4 
rA  
 


 12   0.3333
4.1

3.8
188

176
0.3




 
f) Do estimates of h2 for yield and rA obtained from the half-sib trial in 2006 agree fairly
well with results from the selection experiment?
The correlations are very close (0.3425 vs 0.333). Heritability estimates are within a
reasonable range (0.44 vs 0.375)
4
3) In class, we discussed several approaches for using molecular markers to improve crops
 Marker-based selection (MBS)
 Marker-assisted selection (MAS)
o F2 enrichment
o Marker-assisted backcrossing (MABC)
o Marker-assisted recurrent selection (MARS)
 Genomic Selection (GS).
For each of the scenarios below, indicate which approach(es) would be most appropriate, and
explain your choice. (answers will vary)
a) A diploid, self-pollinating crop, with available marker density at ~5 cM throughout the
genome. Three QTL have been identified that explain a large proportion of the variation
for resistance to an important disease. Existing cultivars possess anywhere from 0-3 of
the desired alleles at these loci. New lines must have higher yield than existing cultivars
and acceptable quality to justify release.
F2 enrichment would work well for a self-pollinating crop, while permitting you to select
desirable segregants for other important characteristics as well. You might use MABC if
you had a very good line that was missing a single favorable QTL.
b) A commercial breeding company for a major, high value crop. Facilities and resources
are available for high density, high throughput genotyping. Important traits such as yield
are controlled by many QTL with small effects.
This would be a good candidate for genomic selection, which would allow you to exploit
all of the additive genetic variation for yield and other traits, rather than just a few QTL.
c) A minor, relatively new crop that is cross-pollinated by insects. Hand pollinations are
time-consuming and no male sterility system has been developed. Commercial varieties
may be open-pollinated populations or synthetics. Several molecular markers have been
developed using a candidate gene approach that impart desirable quality characteristics.
MARS would be a good choice here, since you will likely be using recurrent selection for
yield and other traits. Backcrossing would be difficult for populations given the large
number of crosses required to maintain the genetic diversity of the population, but it
would be feasible to use MABC to fix the desirable alleles in inbred parents of a
synthetic.
5
8 pts
4) A number of papers have been published in recent years comparing the use of AMMI and
GGE as techniques for analyzing genotype by environment interactions (GEI) in
multilocational trials. Briefly explain the features of these linear-bilinear models and the
difference between them.
AMMI and GGE are methods for analyzing GEI to identify patterns of interaction and reduce
background noise. They combine conventional ANOVA with principal component analysis
and may provide more reliable estimates of genotype performance than the mean across sites.
Biplots help to visualize relationships among genotypes and environments and help to select
varieties with good adaptation to target breeding environments.
AMMI model:
GGE model:
Yijl =  + Gi + Ej + (kikjk) + dij + eijl
Yijl =  + Ej + (kikjk) + dij + eijl
In the AMMI model the principal component analysis is performed on the GXE interactions
after removing the main effects of genotypes and environments. In the GGE model the PCA
is performed on the G + GE term combined after removing the main effects of the
environment.
Gauch, H.G. 2006. Statistical analysis of yield trials by AMMI and GGE. Crop Sci. 46:
1488-1500.
Gauch, H.G., H.-P. Piepho, and P. Annicchiarico. 2008. Statistical analysis of yield trials by
AMMI and GGE: further considerations. Crop Sci. 48: 866-889
Yan, W., M.S. Kang, B. Ma, S. Woods, and P.L. Cornelius. 2007. GGE Biplot vs. AMMI
Analysis of Genotype-by-Environment Data. Crop Sci. 47: 643-653.
Yang, R.-C., J. Crossa, P.L. Cornelius, and J. Burgueño. 2009. Biplot analysis of genotype x
environment interaction: proceed with caution. Crop Sci. 49: 1564-1576.
6
5) Recall the BLUP example from Bernardo’s text that we discussed in class and in lab. The
drawback of the IML program that we used was that one had to manually iterate the program
many times to obtain the correct estimates of the breeding values and genetic variance. I have
modified the PROC MIXED program I gave to you in lab so that it gives correct estimates in
a single run.
data purelines;
input set n_loc variety$ genotype y;
datalines;
1 18
Morex 1 4.45
1 18
Robust
2 4.61
1 18
Stander
4 5.27
2 9
Robust
2 5.00
2 9
Excel 3 5.82
2 9
Stander
4 5.79
;
data GR;
input parm row col1-col4;
datalines;
1 1 2
1
0.875
1 2 1
2
1.6875
1 3 0.875
1.6875
2
1 4 0.6875
1.34375
1.421875
;
run;
options nodate nocenter;
0.6875
1.34375
1.421875
2
Proc Mixed data=purelines noclprint covtest;
class genotype set;
weight n_loc;
Model y=set/outpredm=LGGR outpred=PredGGR;
Random genotype/ldata=GR type=lin(1) s;
lsmeans set;
ods listing exclude solutionR; ods output solutionR=BLUPGGR;
ods output covparms=VGGR;
Proc print data=PredGGR;
Run;
Data BLUPs;
Set BLUPGGR;
Keep genotype BLUP Pred_Error P_value;
BLUP=Estimate;
Pred_Error=StdErrPred;
P_Value=Probt;
Proc print;
Title1'Genotype effect BLUPs, Prediction Error and P-Value for H0:BLUP=0';
Run;
Quit;
7
The Mixed Procedure
Covariance Parameter Estimates
Cov Parm
Standard
Error
Estimate
LIN(1)
Residual
0.3499
0.05281
Z
Value
Pr Z
1.18
0.68
0.2384
0.2488
0.2968
0.07784
The linear term in the model
estimates additive genetic
variance. The residual is VR.
Least Squares Means
Effect
set
set
set
1
2
Estimate
Standard
Error
DF
t Value
Pr > |t|
4.8874
5.3525
0.6754
0.6767
1
1
7.24
7.91
0.0874
0.0801
Obs set n_loc variety genotype
1
2
3
4
5
6
1
1
1
2
2
2
18
18
18
9
9
9
Morex
Robust
Stander
Robust
Excel
Stander
1
2
4
2
3
4
y
Pred
StdErr
Pred
4.45
4.61
5.27
5.00
5.82
5.79
4.45182
4.59133
5.28684
5.05642
5.80165
5.75193
0.054045
0.049300
0.049335
0.062002
0.075379
0.062191
DF Alpha
1
1
1
1
1
1
0.05
0.05
0.05
0.05
0.05
0.05
Lower
3.76512
3.96492
4.65998
4.26862
4.84387
4.96172
These are the BLUEs for
the fixed effects in the
model
Upper
Resid
5.13853 -0.001821
5.21775 0.018666
5.91371 -0.016845
5.84422 -0.056420
6.75943 0.018351
6.54214 0.038069
Genotype effect BLUPs, Prediction Error and P-Value for H0:BLUP=0
Pred_
Obs
genotype
BLUP
Error
P_Value
1
2
3
4
1
2
3
4
-0.43562
-0.29611
0.44912
0.39941
0.67572
0.67634
0.67947
0.67558
These are the BLUPs and
the standard errors that you
would obtain if you iterated
the matrix calculations
many times.
0.63546
0.73729
0.62817
0.66009
8
a) Run the same analysis using a subset of the data from the Minnesota barley breeding
program that we used for the TASSEL demonstration (MN06ex2.xls). Fifty breeding
lines were evaluated for yield at two locations (set=1). In addition, the first 25 lines were
evaluated at a third location (set=2). The kinship matrix (also in MN06ex2.xls) was
obtained from TASSEL using SNP data. A single column of 1’s was added in the first
column to obtain the correct format for the LIN(1) covariance matrix option in SAS. You
do not need to submit your program or output with this exam, provided that you can
answer questions b and c.
b) Give the estimate you obtained for additive genetic variance and its standard error.
VA = 123874
se = 51821
c) Identify the two lines with the highest breeding values. If you crossed these lines, what is
the expected breeding value of an inbred line (RIL) derived from the cross?
06MN-36 (FEG 141-20)
06MN-35 (FEG141-18)
927.262
856.485
Predicted mean breeding value = (BLUPA + BLUPB)/2
= (927.262+856.485)/2 = 891.8735
d) If you were the barley breeder in Minnesota would you use BLUPs to choose parents for
making crosses or would you use the mean yield across sites? Explain your answer.
I would use the BLUPs because they have adjusted for the imbalance in the data, and
have utilized information from relatives to make the best predictions about breeding
values.
9
Download