Maize Single Site Analysis: 4 Location Batch Contributors Shawn Yarnes a, Graham McLaren a, & Fred van Eeuwijk b a The Integrated Breeding Platform, b Wageningen University Summary This tutorial describes a batch run of four single site analyses. Each analysis examines the performance of germplasm replicated three times in a randomized complete block at each location. See more on the design of the trial in the previous tutorial. Restore from Previous Tutorial Screenshots and activities in this tutorial build upon work preformed in previous tutorials. If you are not following the maize tutorials in sequence, restore the Maize Tutorial database (.sql) to the end of the previous tutorial, Design and Manage Field Trial, to match database contents with current tutorial. Restore Trial with Observations (sql) Introduction Breeding View’s single site analysis produces adjusted means, best linear unbiased estimators and best linear unbiased predictors (BLUEs and BLUPs) per genotype, as well as summary statistics to describe the raw data. The next tutorial, Maize Multisite Analysis, uses the summary statistics and adjusted means (BLUEs) to preform a genotype by environment (GxE) analysis. Adjusted means can also be used in a QTL (quantitative trait loci) analysis pipeline. Select Dataset to Analyze Open Single Site Analysis from the Statistical Analysis menu of the Workbench. Browse for to find the 3 Site Trial dataset. Select the CIMMYT 2012 Trial. Review the factors and make sure the 7 phenotypic traits measured in this trial dataset are selected, and click Next. Specify Analysis Conditions Use the default analysis name. Select Incomplete Block Design for the design type. The factor that defines environment is LOCATION_NAME. Select all four locations to preform individual single site analyses on each location. REP_NO is the factor in this dataset that defines replications. DESIGNATION defines the germplasm factor to be used in the analysis. Click Run Breeding View to launch the breeding view application. Run Analysis When the Breeding View application launches the analysis conditions and data are loaded. Notice that the four locations and all seven traits are selected with green checks. The analysis pipeline includes a set of connected nodes, which can be used to run and configure pipelines. Leave the default settings for the pipeline and run. Right click the first box and Run Selected Environment Pipelines. When the analysis is complete a popup notifies the user. Quality Assurance Breeding View provides an overview of potentially influential measurements to help users identify and possibly exclude observations. Many influential observations will reflect true genotypic variation and these data should not be excluded from the trial. The only observations that deserve exclusion are obvious errors or measurements influenced by heterogenous environmental variation within a block, like damage to a single plot. Select and open the Sabana Del Medio report. The table displays potential influential observations identified by the raw data method, which identifies observations exceeding 1.5 times the interquartile range, and the residual method, which identified standardized residuals by mixed model analysis. Select the potential outlier, plant height (PH) with the value of 95. Review the box plot for this trait as well as the related measurements for this genotype. Exclude Observation The measurement of 95 is well below the height measurements for other replicates of this genotype, indicating a possible error in the phenotypic observation. Exclude this anomalous value from the analysis by selecting Set Selected to Missing. Re-Run Analysis Return to the analysis pipeline and run again. Notice the boxplot for PH at Sabana Del Medio has changed in the Quality Assurance report. The adjusted means and heritability values for this trait will have also changed in the trial Report. Select Upload to the BMS to save the adjusted means and summary statistics to the BMS database for use in a subsequent genotype by environment (GxE) analysis. Analysis Report This section of the tutorial provides a brief guide to interpretation of the Report tab, which includes analysis results, including graphs. The report tab summarizes the individual single analyses preformed in batch. The majority of the results are included within the individual trial reports. Report Tab Contents Heritability Table Combined file of predicted means: Excel file of BLUEs and BLUPs Links to individual trial reports Heritability Table The values presented in the heritability table summarize the heritabilities calculated for each trait by location. Broad-sense mean line heritability is derived from an estimate of the correlation between the genotype BLUPs and their unknown true value (Cullis, Smith & Coombes, 2006). If a model cannot be fitted to the trait data, such when there is no variability in the trait measurement, that trait will be missing from the heritability table. Heritability Values: Notice that some of the seven traits are missing from some or all of the locations. For example, average shelling percentage (aSP) is missing from all locations. There was no variation measured in this trait at any location. No adjusted means could be calculated for this trait, because all entries were recorded as 0.788, making this trait non-informative. Combined File of Predicted Means Select the link to the combined file of predicted means and open in Excel. Non-informative traits that could not be fitted with a model will appear as missing data. BLUPs across all locations: Notice that there is no column for average shelling percentage (aSP), because models could not be fitted to the aSP data at any location. Two traits, grain moisture (MOI) and anthesis date (AD), at San Gilberto have empty cells for the same reason. Individual Trial Reports Although the four single site analyses are summarized together in the heritability table, each location is represented by an individual analysis. Select the link to the Tlaltizapan individual trial report to review the analysis performed at this location. Individual trial reports include: File of predicted means: Link to Excel (.xls) file containing an environment-specific subset of BLUEs and BLUPs presented in the combined file of predicted means. 20 Best Genotypes Table: This table is 20 best genotypes as defined by BLUPs sorted by the first trait. If you are interested in sorting genotypes by other traits, you can scroll down to view the analysis of individual traits or perform a custom sort of BLUPs within the file of predicted means (.xls). Summary of Traits: A table presenting the minimum, mean, maximum, and heritability for each trait within this location based on BLUPs Estimated Genetic Correlations between traits Principle Components Biplot Individual Trait Analyses: Summary statistics of raw data, heritability, sorted genotype table, summary statistics based on BLUPs, phenotypic correlations, principle components biplot File of Predicted Means Tlaltizapan Report Summary: The beginning of each report provides the project name, the environment name, the field design, and the date. Users are presented a link to the adjusted means data for this environment, which is a subset of the data presented in the combined mean file reporting all locations. Users are also notified about analysis failures. In this case average shelling percentage (aSP) failed to be fitted with a mixed model, because this trait had no reported variation within this location. Best Genotypes Sorted by BLUPs Tlaltizapan 20 Best Genotypes Sorted by AD: This table is the 20 best genotypes sorted by BLUPs for the first trait, anthesis date, as defined by the default Analysis Pipeline settings under the Generate Report node. The default settings sort values in ascending sort with large values interpreted as “best”. So in the case of some traits, like anthesis date, where a breeder would probably be selecting for germplasm with earlier dates, you can customize Breeding View settings to sort by other traits, change the direction to ascending values, or sort and display the BLUEs. You can also use the custom sort feature of Excel to sort BLUPs within the provided spreadsheet (.xls). Summary of Traits by BLUP Tlaltizapan Summary of Traits: The minimum, mean, maximum, and heritability for each trait based on BLUPs Estimated Genetic Correlations BLUPs Principle Components Biplot Tlaltizapan Principle Components Biplot of BLUPs: Almost 80% of the variation observed at this location is explained by the first two principle components. The first principle component describes 62.42% of the variation, and is correlated most strongly to the traits: field weight (FW), grain yield (GY), plant height (PH), and ear height (EH). The second principle component describes 17.05% of the variation and is correlated most strongly to moisture (MOI). Pairwise Phenotypic Correlations Tlaltizapan Estimated Genetic Correlations: Pairwise comparison of phenotypic traits. There is a strong positive correlation (0.9978) between the related yield measures, grain yield (GY) and field weight (FW). These two yield traits are moderately negatively correlated (-0.6890 & -0.6830 respectively) to anthesis date. In other words, late anthesis is correlated to low yield. Summary Statistics for Individual Trait Raw Data Tlaltizapan Summary Statistics: Anthesis Date (AD) based on raw data Estimated Heritability of Individual Trait Estimated Heritability: Anthesis date (AD) calculated at Tlaltizapan Best Genotypes by Trait Sorted by BLUEs 20 “Best” Anthesis Date (AD) Genotypes Calculated at Tlaltizapan: Genotypes are sorted by BLUEs in descending order by value. For traits, like anthesis date, where small phenotypic measures would be considered “best” we recommend using the custom sort features of Breeding View under the Generate Report node or using Excel to sort BLUEs within the provided spreadsheet (.xls). Standard Errors of Difference Diagnostic Bioplots of Individual Traits Diagnostic biplots illustrate the fit of the trait data to the analysis model based on the residuals. Residuals are estimates of experimental error obtained by subtracting the observed phenotypic measurement from the means predicted by the model. When residuals appear to behave randomly, it suggests that the model fits the data well. On the other hand, if non-random structure is evident in the residuals, it suggests that the model fits the data poorly (NIST/SEMATECH eHandbook of Statistical Methods). Well-Fitted Characteristics of Each Diagnostic Plot Histogram of Residuals: Gaussian, or normal, distribution Fitted-Value Plot: Random distribution, “shot-gun” pattern Normal Plot: Distribution in a straight line across the diagonal Half-Normal Plot: Distribution in a straight line across the diagonal Diagnostic plots for Anthesis Data at Tlaltizapan: Example of a trait exhibiting a good model fit Diagnostic plots for Anthesis Data at Jutiapa: Example of a trait exhibiting a poor model fit References Cullis, B. R., Smith, A. B., & Coombes, N. E. (2006). On the design of early generation variety trials with correlated data. Journal of Agricultural, Biological, and Environmental Statistics, 11(4), 381-393. Li, J., & Ji, L. (2005). Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity, 95(3), 221-227. Murray, D. Payne, R,, & Zhang, Z. (2014) Breeding View, a Visual Tool for Running Analytical Pipelines: User Guide. VSN International Ltd. (.pdf) (associated sample data .zip) NIST/SEMATECH e-Handbook of Statistical Methods, http://www.itl.nist.gov/div898/handbook/ November 2014 Funding & Acknowledgements The Integrated Breeding Platform (IBP) is jointly funded by: the Bill and Melinda Gates Foundation, the European Commission, United Kingdom's Department for International Development, CGIAR, the Swiss Agency for Development and Cooperation, and the CGIAR Fund Council. Coordinated by the Generation Challenge Program the Integrated Breeding Platform represents a diverse group of partners; including CGIAR Centers, national agricultural research institutes, and universities. The statistical algorithms in the Breeding View were developed by VSNInternational Ltd in collaboration with the Biometris group at University of Wageningen. Maize demonstration data was provided by Mike Olsen from the CIMMYT, the International Center for Maize and Wheat Improvement, breeding program. These data have been adapted for training purposes. Any misrepresentation of the raw breeding data is the solely the responsibility of the IBP. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. ,