advertisement

Advanced Plant Breeding CSS 650 Take-Home Final Exam, Fall 2009 Due Wednesday, Dec. 9, 2009 Name KEY Show your work! This is essentially a final homework assignment. You can refer to your notes, journal articles, and text books. Do as much as you can on your own, but you may compare answers with your classmates if you wish. 1) The eggplant breeder at a private seed company has just retired and you are hired to take over the program. You are given a set of 20 new dihaploid lines and wish to know their potential for use as parents in hybrids. As we learned from the poster session in class, hybrid seed production is currently done by hand pollination in eggplant. What mating designs could be used to estimate combining ability (hint: we discussed two in class and another is commonly used in hybrid corn breeding)? List any questions you would ask the retiree that would help you to choose the best design. For the purposes of this exam, assume specific responses to those questions and explain your consequent choice of a mating design. Draw a diagram showing how you would make the crosses. Possible mating designs include a diallel, a factorial mating design (NC Design II), or a testcross analysis. A Design I could potentially be used to estimate combining ability, but you would need to identify additional lines or a population to use as females. Possible questions for the breeder (answers will vary) 1) Is there much heterosis in eggplant? How important is general vs specific combining ability in eggplant? Assuming there is heterosis, we will need a design that can estimate both general and specific combining ability 2) Is there currently a distinction between male (pollen producers) and female parents in hybrid seed production? Are there important maternal effects? I’ll assume that the answers to these questions are no. Any line can potentially be a male or female and we won’t need to make reciprocal crosses to estimate maternal effects. 3) Are there known heterotic groups in eggplant and what do we know about the heterotic pattern of this particular set of dihaploids? I will assume that there is some knowledge of heterotic patterns and that crosses were made within heterotic groups. 4) Were the dihaploid lines derived from a common cross or did they have different parents? To what extent do they represent the available genetic diversity among eggplant cultivars? If I assume that they came from a single cross and that there are two other important heterotic groups in eggplant, then I could choose inbred tester lines from the other two groups and make all of the 20 x 2 possible testcrosses. If I assume that half of the lines came from one heterotic group and the other half from another, a factorial mating scheme between the two sets of lines would be a good option. If there is no knowledge of heterotic groups and the lines represent a wide range of crosses among parents of diverse origins, I may want to consider a diallel mating scheme. If making all possible 190 crosses is prohibitive I could use some form of partial diallel. 1 2) A breeder wants to improve pearl millet as a grain and forage crop in North Dakota. In 2006 he evaluated 300 half-sib families from a breeding population in a yield trial at a single location with 3 replications (blocks). To assist in developing a selection index, he decided to estimate heritabilities and the genetic correlation (rA) between grain yield and plant height. He used the following SAS code to generate univariate analyses for both traits as well as an analysis of covariance for the two traits. proc glm; class Block Family; model Yield Height =Block Family; manova h=Family/printh printe; random Block Family/test; Run; The GLM Procedure Multivariate Analysis of Variance H = Type III SSCP Matrix for Family Yield Yield Height 161.46 1255.8 Height 1255.8 83241.6 Output from printh E = Error SSCP Matrix Yield Yield Height 179.4 1315.6 Height 1315.6 65780.0 Output from printe He summarized the ANOVA for yield as follows: Source Block Family Error df SS MS F Prob>F 2 299 598 161.46 179.40 0.54 0.30 1.8 0.0000 From these results he calculated the phenotypic variance among half-sib families to be 0.18 and obtained an estimate of heritability for yield on a family mean basis = 0.44. 2 2) cont’d. Use the SAS output above to answer the following questions (show your work) a) Calculate phenotypic variance among half-sib families for plant height and estimate heritability on a family mean basis for this trait. MSF = SSF/299 = 83241.6/299 = 278.4 MSE = SSE/598 = 65780/598 = 110 P2 F2 2X F2 E2 MSF 278.4 92.8 3 3 3 F2 (MSF MSE ) / r (278.4 110) / 3 56.13 h2 F2 56.13 0.60 P2 92.8 b) What is the additive genetic correlation (rA) for yield and plant height? (you will need to calculate the genetic covariance for these traits in the same manner that you calculated the genetic variance among half sib families for individual traits) MCPF = SCPF/229 = 1255.8/299 = 4.2 MCPE = SCPE/598 = 1315.6/598 = 2.2 CovHSXY (MCPF MCPE ) / r (4.2 2.2) / 3 0.6667 rA CovHSXY Cov A 0.6667 0.3146 A X A Y HSX HSY 0.08 * 56.1333 (note that CovHSXY = ¼CovA, but coefficients in the numerator and denominator cancel out) You could also calculate the phenotypic correlation among family means (but this was not requested) rP MCPF 4.2 0.3425 MSFXMSFY 0.54 * 278.4 c) If he selected the best 15 families for grain yield, what would be the expected response to selection? R X ihX A X ih2X PX 2.06 * 0.4444 * 0.18 0.388 t/ha 3 He decided to validate these estimates by making separate selections for yield and plant height. He selected the best 15 families for each trait: Mean of all families Mean of 15 selected families Grain Yield t/ha 3.5 4.3 Plant Height cm 170 190 In 2007 he intermated remnant seed of selected families to form the C1 cycles of selection. In 2008 he evaluated the selection response: Selection for yield t/ha cm Cycle 0 Cycle 1 3.8 4.1 176 180 Selection for plant height cm t/ha 176 188 3.8 3.9 d) Calculate the realized heritability for yield. h2 R 4.1 3.8 0.3 0.375 S 4.3 3.5 0.8 e) Use the relationship below to obtain an estimate of rA from the selection experiment. CR X CRY rA2 R X RY 3.9 3.8 180 176 0.1 4 rA 12 0.3333 4.1 3.8 188 176 0.3 f) Do estimates of h2 for yield and rA obtained from the half-sib trial in 2006 agree fairly well with results from the selection experiment? The correlations are very close (0.3425 vs 0.333). Heritability estimates are within a reasonable range (0.44 vs 0.375) 4 3) In class, we discussed several approaches for using molecular markers to improve crops Marker-based selection (MBS) Marker-assisted selection (MAS) o F2 enrichment o Marker-assisted backcrossing (MABC) o Marker-assisted recurrent selection (MARS) Genomic Selection (GS). For each of the scenarios below, indicate which approach(es) would be most appropriate, and explain your choice. (answers will vary) a) A diploid, self-pollinating crop, with available marker density at ~5 cM throughout the genome. Three QTL have been identified that explain a large proportion of the variation for resistance to an important disease. Existing cultivars possess anywhere from 0-3 of the desired alleles at these loci. New lines must have higher yield than existing cultivars and acceptable quality to justify release. F2 enrichment would work well for a self-pollinating crop, while permitting you to select desirable segregants for other important characteristics as well. You might use MABC if you had a very good line that was missing a single favorable QTL. b) A commercial breeding company for a major, high value crop. Facilities and resources are available for high density, high throughput genotyping. Important traits such as yield are controlled by many QTL with small effects. This would be a good candidate for genomic selection, which would allow you to exploit all of the additive genetic variation for yield and other traits, rather than just a few QTL. c) A minor, relatively new crop that is cross-pollinated by insects. Hand pollinations are time-consuming and no male sterility system has been developed. Commercial varieties may be open-pollinated populations or synthetics. Several molecular markers have been developed using a candidate gene approach that impart desirable quality characteristics. MARS would be a good choice here, since you will likely be using recurrent selection for yield and other traits. Backcrossing would be difficult for populations given the large number of crosses required to maintain the genetic diversity of the population, but it would be feasible to use MABC to fix the desirable alleles in inbred parents of a synthetic. 5 8 pts 4) A number of papers have been published in recent years comparing the use of AMMI and GGE as techniques for analyzing genotype by environment interactions (GEI) in multilocational trials. Briefly explain the features of these linear-bilinear models and the difference between them. AMMI and GGE are methods for analyzing GEI to identify patterns of interaction and reduce background noise. They combine conventional ANOVA with principal component analysis and may provide more reliable estimates of genotype performance than the mean across sites. Biplots help to visualize relationships among genotypes and environments and help to select varieties with good adaptation to target breeding environments. AMMI model: GGE model: Yijl = + Gi + Ej + (kikjk) + dij + eijl Yijl = + Ej + (kikjk) + dij + eijl In the AMMI model the principal component analysis is performed on the GXE interactions after removing the main effects of genotypes and environments. In the GGE model the PCA is performed on the G + GE term combined after removing the main effects of the environment. Gauch, H.G. 2006. Statistical analysis of yield trials by AMMI and GGE. Crop Sci. 46: 1488-1500. Gauch, H.G., H.-P. Piepho, and P. Annicchiarico. 2008. Statistical analysis of yield trials by AMMI and GGE: further considerations. Crop Sci. 48: 866-889 Yan, W., M.S. Kang, B. Ma, S. Woods, and P.L. Cornelius. 2007. GGE Biplot vs. AMMI Analysis of Genotype-by-Environment Data. Crop Sci. 47: 643-653. Yang, R.-C., J. Crossa, P.L. Cornelius, and J. Burgueño. 2009. Biplot analysis of genotype x environment interaction: proceed with caution. Crop Sci. 49: 1564-1576. 6 5) Recall the BLUP example from Bernardo’s text that we discussed in class and in lab. The drawback of the IML program that we used was that one had to manually iterate the program many times to obtain the correct estimates of the breeding values and genetic variance. I have modified the PROC MIXED program I gave to you in lab so that it gives correct estimates in a single run. data purelines; input set n_loc variety$ genotype y; datalines; 1 18 Morex 1 4.45 1 18 Robust 2 4.61 1 18 Stander 4 5.27 2 9 Robust 2 5.00 2 9 Excel 3 5.82 2 9 Stander 4 5.79 ; data GR; input parm row col1-col4; datalines; 1 1 2 1 0.875 1 2 1 2 1.6875 1 3 0.875 1.6875 2 1 4 0.6875 1.34375 1.421875 ; run; options nodate nocenter; 0.6875 1.34375 1.421875 2 Proc Mixed data=purelines noclprint covtest; class genotype set; weight n_loc; Model y=set/outpredm=LGGR outpred=PredGGR; Random genotype/ldata=GR type=lin(1) s; lsmeans set; ods listing exclude solutionR; ods output solutionR=BLUPGGR; ods output covparms=VGGR; Proc print data=PredGGR; Run; Data BLUPs; Set BLUPGGR; Keep genotype BLUP Pred_Error P_value; BLUP=Estimate; Pred_Error=StdErrPred; P_Value=Probt; Proc print; Title1'Genotype effect BLUPs, Prediction Error and P-Value for H0:BLUP=0'; Run; Quit; 7 The Mixed Procedure Covariance Parameter Estimates Cov Parm Standard Error Estimate LIN(1) Residual 0.3499 0.05281 Z Value Pr Z 1.18 0.68 0.2384 0.2488 0.2968 0.07784 The linear term in the model estimates additive genetic variance. The residual is VR. Least Squares Means Effect set set set 1 2 Estimate Standard Error DF t Value Pr > |t| 4.8874 5.3525 0.6754 0.6767 1 1 7.24 7.91 0.0874 0.0801 Obs set n_loc variety genotype 1 2 3 4 5 6 1 1 1 2 2 2 18 18 18 9 9 9 Morex Robust Stander Robust Excel Stander 1 2 4 2 3 4 y Pred StdErr Pred 4.45 4.61 5.27 5.00 5.82 5.79 4.45182 4.59133 5.28684 5.05642 5.80165 5.75193 0.054045 0.049300 0.049335 0.062002 0.075379 0.062191 DF Alpha 1 1 1 1 1 1 0.05 0.05 0.05 0.05 0.05 0.05 Lower 3.76512 3.96492 4.65998 4.26862 4.84387 4.96172 These are the BLUEs for the fixed effects in the model Upper Resid 5.13853 -0.001821 5.21775 0.018666 5.91371 -0.016845 5.84422 -0.056420 6.75943 0.018351 6.54214 0.038069 Genotype effect BLUPs, Prediction Error and P-Value for H0:BLUP=0 Pred_ Obs genotype BLUP Error P_Value 1 2 3 4 1 2 3 4 -0.43562 -0.29611 0.44912 0.39941 0.67572 0.67634 0.67947 0.67558 These are the BLUPs and the standard errors that you would obtain if you iterated the matrix calculations many times. 0.63546 0.73729 0.62817 0.66009 8 a) Run the same analysis using a subset of the data from the Minnesota barley breeding program that we used for the TASSEL demonstration (MN06ex2.xls). Fifty breeding lines were evaluated for yield at two locations (set=1). In addition, the first 25 lines were evaluated at a third location (set=2). The kinship matrix (also in MN06ex2.xls) was obtained from TASSEL using SNP data. A single column of 1’s was added in the first column to obtain the correct format for the LIN(1) covariance matrix option in SAS. You do not need to submit your program or output with this exam, provided that you can answer questions b and c. b) Give the estimate you obtained for additive genetic variance and its standard error. VA = 123874 se = 51821 c) Identify the two lines with the highest breeding values. If you crossed these lines, what is the expected breeding value of an inbred line (RIL) derived from the cross? 06MN-36 (FEG 141-20) 06MN-35 (FEG141-18) 927.262 856.485 Predicted mean breeding value = (BLUPA + BLUPB)/2 = (927.262+856.485)/2 = 891.8735 d) If you were the barley breeder in Minnesota would you use BLUPs to choose parents for making crosses or would you use the mean yield across sites? Explain your answer. I would use the BLUPs because they have adjusted for the imbalance in the data, and have utilized information from relatives to make the best predictions about breeding values. 9