SASGxE Instructions (MS Word)

SASGxE- Genotype x Environment Interaction (GxE) Program Instructions and Output Interpretation Mahendra Dia, Todd C. Wehner and Consuelo Arellano SASGxE Objective: SASGxE is a user friendly, annotated, flexible and efficient SAS program that will allow user to analyze stability statistics of multiple dependent variables, simultaneously, of multi-location trial data. This program is intended for SAS-PC (version 9.3 or higher) under the Microsoft Windows operating system. SASGxE computes univariate stability statistics, input files that are ready to use in existing R packages for multivariate stability statistics, ANOVA, descriptive statistics and the correlation of stability analysis methods. SASGxE program, input sample data, output from sample data (including AMMI and GGE biplots from R statistical software), SAS log, and input data file template are freely available at http://cuke.hort.ncsu.edu/cucurbit/wehner/software.html. SASGxE Functionality: 1. 2. 3. 4. Options statement. Import input data. Compute sum of longitudinal variables. Define macros.  UNIVARIATE1  UNIVARIATE2  UNIVARIATE3  LEVELOFSIG  OUTPUTEXCEL  GENOTYPE  ENVIRONMENT  OUTPUTCSV 5. Creating a macro variable during DATA step execution.  Total number and kind of dependent variable 6. Compute stability statistics of all dependent variable, simultaneously, using macro STABILITY.  Quality checking (QC)  Compute means, sum and coefficient of variation (CV)  Analysis of variance (ANOVA)  Stability statistics - univariate  Spearman rank correlation  Output o Univariate and Multivariate stability statistics o Spearman Rank correlation o Mean, Sum and CV 7. Multivariate stability statistics.  AMMI and GGEBiplot SASGxE Program Description: 1. Options statement We decided to run SASGxE with certain options turned on or off so that program run efficiently and is easy to debug. OPTIONS, such as MPRINT, SYMBOLGEN and MLOGIC, are very helpful at times 1 SASGxE- Genotype x Environment Interaction (GxE) Program Instructions and Output Interpretation Mahendra Dia, Todd C. Wehner and Consuelo Arellano for debugging. However, MLOGIC and SYMBOLGEN are turned off in production macro to improve program efficiency (SAS, 2009). Similarly, OPTIONS LABEL was turned off so that auto-label can be prevented (e.g. in PROC MEANS) so user will not be confused. Since, SASGxE is not intended to print output in SAS Output window therefore OPTIONS DATE and NUMBER are turned off. OPTIONS MPRINT is turned on so that user can view the text generated by macro execution in SAS Log window. 2. Import input data SASGxE starts with user entered fields. User is required to feed input data file location, name, sheet name, and sum of squares at %LET IPATH, %LET INAME, %LET ISHEETNAME1, %LET SUMOFSQR statement, respectively. The value for Type I Sum of Squares and Type III Sum of Squares are 1 and 2, respectively. SASGxE requires input data file in Excel (type equals ‘.xlsx’ only). Highlighted fields are user entered in below code. The user input records are created into macro variables using %LET statement. %LET %LET %LET %LET IPATH =…………………; /*INPUT FILE PATH*/ INAME =…………………; /*INPUT FILE NAME*/ ISHEETNAME1 =…………………; /*INPUT FILE SHEET NAME*/ SUMOFSQR =…………………; /*1= TYPE 1 SS ; 2 = TYPE 3 SS*/ Input data file name, type, and location can be retrieved by right-clicking on input data file and selecting ‘Properties’. Once in ‘Properties’ window, the user can see input data file details under ‘General’ tab. These details include file name , file type , and file location (Figure 1). Similarly, input data sheet name can be found on left side of bottom bar on Excel file. Figure 1. Screenshot of ‘Properties’ window of input Excel data file (Panel A) and example Excel file (Panel B) represent file name , file type , file location and sheet name . 2 SASGxE- Genotype x Environment Interaction (GxE) Program Instructions and Output Interpretation Mahendra Dia, Todd C. Wehner and Consuelo Arellano Input file location can also be viewed by clicking on file path or address bar of folder where input data is stored (Figure 2). Figure 2. Screenshot of ‘folder’ where input Excel data file exist. File location indicated by an arrow mark. is enclosed in box and In SAS, the input data file is required to have missing records represented by dot (‘.’) and not to have blank first row. We have also provided an input data file template, which is freely available at http://cuke.hort.ncsu.edu/cucurbit/wehner/software.html. Input data file template is comprised of column names including YR (year), LC (location), RP (replication), CL (cultigen or genotype), and Dependent Variables 1 to n (Figure 3). Hereafter, ‘genotype’ is used to indicate cultigen, cultivar, variety or genotype. SASGxE is not sensitive with column positions. However, SASGxE requires the user not to change the column names that are indicated in ‘bold’ and ‘capital case’. Dependent variables are indicated in ‘proper case’ and ‘non bold’, and user is allowed to store multiple dependent variables. SASGxE is capable of analyzing multiple dependent variables simultaneously. Since SAS programing language requires dataset and variable name less than 32 characters long therefore it recommended storing short names for dependent variable and input dataset (< 20 character is desired). Figure 3. Screenshot of input data template. Required variable names for year (YR), location (LC), replication (RP), and cultigen or genotype (GN) are represented in ‘bold’, ‘capital case’ and enclosed in separate box. 3 SASGxE- Genotype x Environment Interaction (GxE) Program Instructions and Output Interpretation Mahendra Dia, Todd C. Wehner and Consuelo Arellano SASGxE imports input data using PROC IMPORT. Input sample data (SASGxE_PROG_INPUT_DATA.XLSX) consists of 3 years, 5 location, 2 replication, 10 genotypes and 2 dependent variables (Figure 4). Two dependent variables are marketable fruit weight (MKWT) and cull fruit weight (CLWT). Dependent variables are recorded at 4 different times on same individual within a plot (MKWT1-4; CLWT1-4) and unit is ‘pounds/plot’. Missing values are represented by dot (‘.’). Figure 4. Screenshot of input sample data template consists of year (YR), location (LC), replication (RP), genotype (CL), and dependent variables (MKWT1-4; CLWT1-4) columns and top 25 rows. 3. Compute sum of longitudinal variables After importing data, SASGxE computes sum of longitudinal data (SUM function), rename genotype names (IF-ELSE-THEN statement), and drops those dependent variable on which stability statistics is not required (DROP statement). A dataset is longitudinal if it records the same type of information on the same individuals over a time period. In below program, dropped variables are highlighted and dependent variables values are converted from pounds/plot (lbs/plot) to megagram/hectare (Mg/ha). DATA TEMPA1 (RENAME=(CLT=CL)); SET TEMPA1; MKWT=SUM(MKWT1,MKWT2,MKWT3,MKWT4); /*SUM ACROSS THE DEPENDENT VARIABLES*/ CLWT=SUM(CLWT1,CLWT2,CLWT3,CLWT4); /*SUM ACROSS THE DEPENDENT VARIABLES*/ MKMGHA=MKWT*0.40751; /*CALCULATE YIELD MG/HA FOR 12 FT PLOT SIZE*/ CLMGHA=CLWT*0.40751; /*FACTOR 0.40751 CONVERTS LBS/PLOT TO MG/HA*/ IF CL=01 THEN CLT='Mountain Hoosier ELSE IF CL=02 THEN CLT='Hopi Red Flesh ELSE IF CL=03 THEN CLT='Early Arizona '; '; '; 4 SASGxE- Genotype x Environment Interaction (GxE) Program Instructions and Output Interpretation Mahendra Dia, Todd C. Wehner and Consuelo Arellano ELSE ELSE ELSE ELSE ELSE ELSE ELSE IF IF IF IF IF IF IF CL=04 CL=05 CL=06 CL=07 CL=08 CL=09 CL=10 THEN THEN THEN THEN THEN THEN THEN CLT='Starbrite F1 '; CLT='Stone Mountain '; CLT='Stars-N-Stripes F1'; CLT='AU-Jubilant '; CLT='Calhoun Gray '; CLT='Big Crimson '; CLT='Legacy F1 '; DROP MKWT1 MKWT2 MKWT3 MKWT4 CLWT1 CLWT2 CLWT3 CLWT4 MKWT CLWT CL; RUN; Except YR, LC, RP and CL, SASGxE treats other column as a dependent variable and computes stability statistics on each of them. Therefore it is suggested to drop dependent variables in this step if stability statistics is not needed on them. This dataset step is an optional. User can remove or skip this step from program if dependent variable in input data does not required to be recreated. The SAS GEI code will not break by removing this step because dataset file names are same (TEMPA1) as previous step file name. 4. Defining macro Macro UNIVARIATE1 computes regression slope (bi), standard error of slope, deviation from regression (S2d), T-test and F-test on regression slope (H0: bi = 1) and deviation from regression (H0: S2d = 0), and level of significance using PROC GLM. Results of these statistics are captured in SLOPE&DEPVAR.SAS file, where &DEPVAR is a dependent variable name (Figure 5). The ‘.SAS’ file can be located from following path: SAS program window  Explorer panel  Explorer tab  Libraries  Work library. The level of significance of T-test and F-test at 0.05, 0.01 and 0.001 is represented by ‘*’, ‘**’, ‘***’; respectively. Figure 5. Screenshot of SLOPE&DEPVAR.SAS file showing analysis of regression slope (bi), standard error of slope, deviation from regression (S2d), T-test and F-test on regression slope (H0: bi = 1) and deviation from regression (H0: S2d = 0) and level of significance of T-test and F-test. Macro UNIVARIATE2 computes Wricke’s ecovalence (Wi), Shukla’s stability variance (σi2), Shukla’s squared hat (ŝi2), and Perkins and Jinks beta (βi) using PROC IML. Results of these statistics are captured in UNIVARIATE2&DEPVAR.SAS file (SAS window  Explorer panel  Explorer tab  Libraries  Work library), where &DEPVAR is a dependent variable name (Figure 6). 5 SASGxE- Genotype x Environment Interaction (GxE) Program Instructions and Output Interpretation Mahendra Dia, Todd C. Wehner and Consuelo Arellano Figure 6. Screenshot of UNIVARIATE2&DEPVAR.SAS file showing analysis of Wricke’s ecovalence (Wi), Shukla’s stability variance (σi2), Shukla’s squared hat (ŝi2), and Perkins and Jinks beta (βi). The macro UNIVARIATE3 computes least square (LS) means, standard error of LS means, least significant difference (LSD) of mean, and Kang’s Yield-Stability statistics (YSi). Results of macro UNIVARIATE3 are captured in UNIVARIATE3&DEPVAR.SAS file (SAS window  Explorer panel  Results tab  Libraries  Work library), where &DEPVAR is a dependent variable name (Figure 7). LSD is used to compare trait means across genotypes. Figure 7. Screenshot of UNIVARIATE3&DEPVAR.SAS file showing analysis of least square (LS) means, standard error of LS means, least significant difference (LSD) of mean, and Kang’s YieldStability statistics (YSi). Macro LEVELOFSIG concatenates Spearman correlation value with level of significance. The level of significance at 0.05, 0.01 and 0.001 is represented by ‘*’, ‘**’, ‘***’; respectively. Macro OUTPUTEXCEL exports output files in Excel files (.xlsx only) and to same folder or location where input data file is placed. Similarly, macro OUTPUTCSV exports output files as comma separated value (CSV) files and to same folder or location where input data file is placed. These CSV files are loaded into ‘RStudio’ to analyze multivariate stability statistics using AMMI and GGE Biplot models. Macro GENOTYPE, ENVIRONMENT and LOCATION generate shorter name for ‘genotypes/cultivars’, ‘environment’ and ‘location’, respectively, so that visualization of AMMI and GGEBiplot output is legible. 5. Creating a macro variable during DATA step execution 6 SASGxE- Genotype x Environment Interaction (GxE) Program Instructions and Output Interpretation Mahendra Dia, Todd C. Wehner and Consuelo Arellano SASGxE creates macro variables for total number of (&LAST_DEPVARIABLE) and each dependent variable (&DEPVARX) using SYMPUT routine in a DATA step. To improve program efficiency DATA _NULL_ statement was used. DATA _NULL_; SET START1 END=END_OF_DATASET; CALL SYMPUT ('DEPVARX'||TRIM(LEFT(_N_)), NAME); /*MACRO FOR DEPENDENT VARIABLE*/ IF END_OF_DATASET THEN CALL SYMPUT ('LAST_DEPVARIABLE', COMPRESS(_N_)); RUN; 6. Compute stability statistics of all dependent variable, simultaneously, using macro STABILITY Macro STABILITY computes different stability statistics for multiple dependent variables using iterative %DO and %END statements. During each iteration one dependent variable is analyzed. The %DO loop stops processing after stop value is equal to &LAST_DEPVARIABLE. Input data is quality checked for missing records and environment is defined in DATA TEMPA2 statement. SASGxE removes rows having missing records for location, year, replication or dependent variable. Environment (EN) is a combination of year and location. Descriptive statistics including means, sum, and coefficient of variance (CV) are computed using PROC MEANS and PROC SQL. Using PROC TRANSPOSE results of descriptive statistics are transposed in user friendly layout so that researchers can interpret them easily. SASGxE generates following descriptive statistics.  Genotype mean over environment and genotype (MEAN&DEPVAR.SAS),  Genotype sum over environment and genotype (SUM&DEPVAR.SAS),  Genotype mean over environment and replication (ENV&DEPVAR.SAS),  Genotype mean over year, location, and replication (M_&DEPVAR_CYLR.XLSX),  Genotype mean over year and location (M_&DEPVAR_CYL.XLSX),  Genotype mean over year (M_&DEPVAR_CY.XLSX),  Genotype mean over location (M_&DEPVAR_CL.XLSX),  Genotype CV over location (CV_&DEPVAR_CL.XLSX),  Genotype mean (M_&DEPVAR_C.XLSX),  Location mean (M_&DEPVAR_L.XLSX),  Location mean over year (M_&DEPVAR_LY.XLSX),  Genotype mean over environment (M_&DEPVAR_CE.XLSX),  Genotype mean over location and replication (M_&DEPVAR_CLR.XLSX), and  Genotype mean over environment and replication (M_&DEPVAR_CER.XLSX). The macro STABILITY automatically export ‘.xlsx’ output file of descriptive statistics to same location/folder where input data file is placed. User can view ‘.SAS’ output file of descriptive statistics at following location: SAS program window  Explorer panel  Explorer tab  Libraries  Work library. Analysis of variance (ANOVA) is computed using PROC GLM to determine the size and significance of genotype x environment interaction (GxE) of dependent variable. SASGxE considers genotype, year, location and replications as random effects. Therefore, all the factors are tested against an error term. An F-test is used to test the significance of each factor. The level of significance of F-test at 0.05, 0.01 and 7 SASGxE- Genotype x Environment Interaction (GxE) Program Instructions and Output Interpretation Mahendra Dia, Todd C. Wehner and Consuelo Arellano 0.001 is represented by ‘*’, ‘**’, ‘***’; respectively. SASGxE computes both Type I and III Sums of Squares (Type I SS and Type II SS). However, ‘ANOVA_&DEPVAR.XLSX’ output file reports either ‘Type I SS’ or ‘Type III SS’ that user request in the beginning of program at %LET SUMOFSQR statement. The macro STABILITY calls macro UNIVARIATE1, UNIVARIATE2 and UNIVARIATE3 to compute univariate stability statistics. These univariate statistics include regression slope (bi); standard error of slope; deviation from regression (S2d); T-test on regression slope (H0: bi = 1); F-test deviation from regression (H0: S2d = 0); Wricke’s ecovalence (Wi); Shukla’s stability variance (σi2); Shukla’s squared hat (ŝi2); Perkins and Jinks beta (βi); least square (LS) means; standard error of LS means; least significant difference (LSD) of mean; and Kang’s Yield-Stability statistics (YSi). The major reasons for defining macro UNIVARIATE1, UNIVARIATE2 and UNIVARIATE3 outside of macro STABILITY so that nested macro is avoided and thus program is efficient and easy to debug. SASGxE assigns ranks to genotypes for each stability parameter. Spearman’s rank correlation is computed using PROC CORR on the ranks to measure the relationship between stability parameters. Genotypes are ranked in increasing order for decreased value of a dependent variable. However, for certain dependent variables such as disease, % cull fruits, etc.; where lower trait value is considered to be good. User is required to assign higher ordinal value to lower trait value of such dependent variable. Otherwise, Spearman’s rank correlation will give wrong output and mislead the user. Genotypes are ranked in increasing order for increased value of deviation from regression (S2d); Wricke’s ecovalence (Wi); Shukla’s stability variance (σi2); Shukla’s squared hat (ŝi2); and Kang’s Yield-Stability statistics (YSi). Regression slope (bi) approximating unity is considered to be stable, therefore genotypes are ranked in increasing order when bi > 1 and decreasing order when bi < 1. Similarly, Perkins and Jinks beta (βi) is stable near zero, therefore genotypes are ranked in increasing order when βi > 0 and decreasing order when βi < 0. The level of significance of correlation at 0.05, 0.01 and 0.001 is represented by ‘*’, ‘**’, ‘***’; respectively. The macro STABILITY invoked macro OUTPUTEXCEL and OUTPUTCSV to generate output files in ‘.xlsx’ and ‘.csv’ formats, respectively. These output files are auto sent to same location/folder where input data file is placed. Following are the output files generated by macro OUTPUTEXCEL.  Genotype mean over year, location, and replication (M_&DEPVAR_CYLR.XLSX),  Genotype mean over year and location (M_&DEPVAR_CYL.XLSX),  Genotype mean over year (M_&DEPVAR_CY.XLSX),  Genotype mean over location (M_&DEPVAR_CL.XLSX),  Genotype CV over location (CV_&DEPVAR_CL.XLSX),  Genotype mean (M_&DEPVAR_C.XLSX),  Location mean (M_&DEPVAR_L.XLSX),  Location mean over year (M_&DEPVAR_LY.XLSX),  Genotype mean over environment (M_&DEPVAR_CE.XLSX),  Genotype mean over environment and replication (M_&DEPVAR_CER.XLSX),  Genotype mean over location and replication (M_&DEPVAR_CLR.XLSX).  Analysis of variance (ANOVA_&DEPVAR.XLSX),  Univariate stability statistics (STAB_&DEPVAR.XLSX),  Spearman’s rank correlation (ANOVA_&DEPVAR.XLSX),  Legend for location used in AMMI (LOC_LEGEND_&DEPVAR.XLSX), 8 SASGxE- Genotype x Environment Interaction (GxE) Program Instructions and Output Interpretation Mahendra Dia, Todd C. Wehner and Consuelo Arellano   Legend for genotype used in AMMI and GGEBiplot (GEN_LEGEND_&DEPVAR.XLSX), and Legend for environment used in AMMI and GGEBiplot (ENV_LEGEND_&DEPVAR.XLSX). Following are the output files generated by macro OUTPUTCSV.  Input file for ‘RStudio’ software for GGEBiplot (Genotype x Environment) analysis (BIPLOT_&DEPVAR_.CSV),  Input file for ‘RStudio’ software for GGEBiplot (Genotype x Location) analysis (BIPLOT2_&DEPVAR_.CSV),  Input file for ‘RStudio’ software for AMMI (Genotype x Environment) analysis (AMMI1_&DEPVAR_.CSV), and  Input file for ‘RStudio’ software for AMMI (Genotype x Location) analysis (AMMI2_&DEPVAR_.CSV). 7. Multivariate statistics SASGxE does not compute multivariate statistics (AMMI and GGEBiplot) for stability analysis per se. However, files (‘BIPLOT_&DEPVAR_.CSV’, BIPLOT2_&DEPVAR_.CSV ‘AMMI1_&DEPVAR_.CSV’, and ‘AMMI2_&DEPVAR_.CSV’) generated by SASGxE can be loaded into ‘RStudio’ software to compute multivariate stability statistics (RStudio, 2014). These files are ready to go in ‘RStudio’. However, the user needs to be cautious with case sensitivity of ‘R’ computing language. ‘RStudio’ is an integrated tool designed to help user more productive with ‘R’ computing software and it requires ‘R’ version 2.11.1 or higher. To improve the visuals of AMMI and GGEBiplot analysis, genotypes, locations and environment are abbreviated to G1-Gn, LOC1-LOCn, and ENV1-ENVn, respectively, where ‘n’ is a total number of entity. User can view the respective abbreviation for a corresponding genotype, location and environment in ‘GEN_LEGEND_&DEPVAR.XLSX’, ‘LOC_LEGEND_&DEPVAR.XLSX’, and ‘ENV_LEGEND_&DEPVAR.XLSX’ files, respectively. In order to analyze stability using AMMI model user need to select Agricolae package in system library window of ‘RStudio’ software (Figure 8). If Agricolae package is not found in system library then user can install it from CRAN repository (CRAN, 2014) (Figure 8). Then reference the path of folder where input data is located from ‘Session’ in ‘window tool bar’ (‘session’ in window tool bar  select ‘set work directory’  select ‘choose work directory’  select folder where data is kept) (Figure 9). User can also reference path in code in ‘Console’ or ‘R Script’ window. However, the user needs to be cautious with the requirement of forward slash in reference path in ‘R’ computing language. User can use below code in ‘Console’ or ‘R Script’ window to analyze AMMI model. The output files ‘AMMI1_&DEPVAR_.CSV’, and ‘AMMI2_&DEPVAR_.CSV’ generate genotype x environment and genotype x location analysis results, respectively. # COMMENT: USER NEEDS TO REPLACE INPUT DATA FILE PATH setwd("E:/PhD Research Work/PhD Articles/Articles for Publication/GxE SAS Prog/Sample Data") #COMMENT: USER NEEDS TO REPLACE FILE NAME (AMMI2_MKMGHA). IT IS A CASE SENSITIVE Data = read.csv(file="AMMI2_MKMGHA.csv", header = TRUE) #COMMENT: VIEW TOP 6 ROWS OF DATA head(Data) attach(Data) 9 SASGxE- Genotype x Environment Interaction (GxE) Program Instructions and Output Interpretation Mahendra Dia, Todd C. Wehner and Consuelo Arellano #COMMENT: USER NEEDS TO REPLACE DEPEDENT VARIABLE NAME (MKMGHA). #IT IS A CASE SENSITIVE model<- AMMI(Locality, Genotype, Rep, MKMGHA, console=FALSE) model$ANOVA #COMMENT: VIEW MODEL OUPUT print(model) detach(Data) # COMMENT: BIPLOT plot(model) # COMMENT: TRIPLOT PC 1,2,3 plot(model, type=2, number=TRUE) # COMMENT: BIPLOT PC1 vs DEPENDENT VARIABLE plot(model, first=0,second=1, number=TRUE) Figure 8. Screenshot of ‘RStudio’ showing AMMI package Agricolae and installing AMMI package Agricolae from CRAN repository . in ‘System Library’ window 10 SASGxE- Genotype x Environment Interaction (GxE) Program Instructions and Output Interpretation Mahendra Dia, Todd C. Wehner and Consuelo Arellano Figure 9. Screenshot of ‘RStudio’ showing referencing a path of folder where input data is located from ‘Session’ in ‘window tool bar’ (‘session’ in window tool bar  select ‘set work directory’  select ‘choose work directory’  select folder where data is kept) . Similarly, user needs to select GGEBiplotGUI package in system library window of ‘RStudio’ software to compute GGEBiplot model (Figure 10). If GGEBiplotGUI package is not found in system library then user can install it from CRAN repository (CRAN, 2014) (Figure 10). Then reference the path of folder where input data is located from ‘Session’ in ‘window tool bar’ (‘session’ in window tool bar  select ‘set work directory’  select ‘choose work directory’  select folder where data is kept) (Figure 9). User can also reference path in code in ‘Console’ or ‘R Script’ window. However, the user needs to be cautious with the requirement of forward slash in reference path in ‘R’ computing language, which is opposite of SAS. The GGEBiplotGUI package accepts input data where rows are labelled and no blank [NA] records. Therefore, system defined function rownames was used to label the rows. Similarly, user defined function na_check was used to replace blank records by trait ‘mean’ of genotype across locations [gge=na_check(gge,”Mean”)] or by ‘zero’ [gge=na_check(gge,”Zero”)]. The user has option to choose either ‘mean’ or ‘zero’ to replace blank [NA] record. If input data does not have missing records then program process the data per se. User can use below code in ‘Console’ or ‘R Script’ window to analyze GGEBiplot. Upon executing the code ‘Model Selection’ window opens and user is required to populate the dropdowns in ‘Model Selection’ window to generate appropriate biplots for mega-environment analysis, genotype evaluation, and test-environment evaluation (Yan et al. 2007). The output files ‘BIPLOT_&DEPVAR_.CSV’, and ‘BIPLOT2_&DEPVAR_.CSV’ generate genotype x environment and genotype x location analysis results, respectively. The detailed description and difference between AMMI and GGEBiplot model was 11 SASGxE- Genotype x Environment Interaction (GxE) Program Instructions and Output Interpretation Mahendra Dia, Todd C. Wehner and Consuelo Arellano presented by Yan et al. (2007). Output of univariate and multivariate statistics of sample data can be found at http://cuke.hort.ncsu.edu/cucurbit/wehner/software.html. # COMMENT: USER NEEDS TO REPLACE INPUT DATA FILE PATH setwd("E:/PhD Research Work/PhD Articles/Articles for Publication/GxE SAS Prog/Sample Data") #COMMENT: USER NEED TO REPLACE FILE NAME. IT IS A CASE SENSITIVE gge = read.csv(file="BIPLOT2_MKMGHA.csv", header = TRUE) #COMMENT: VIEW TOP 6 ROWS OF DATA head(gge) #COMMENT: colnames() gives you column labels #COMMENT: rownames() gives the row labels rownames(gge) = gge[,1] gge = gge[,-1] #COMMENT: VIEW TOP 6 ROWS OF DATA head(gge) #Make a function to find all the NA (Blank) values and replace with either row_mean or zero na_check = function(dat,check) { for(i in 1:nrow(dat)) { for(h in 1:ncol(dat)) { if (is.na(dat[i,h])==T) { if (check=="Mean") { dat[i,h]=mean(na.omit(as.numeric(dat[i,]))) { if(check=="Zero") { dat[i,h]=0 } } } } } } return(dat) } #COMMENT: Replace blank record with mean or zero using user defined function na_check gge=na_check(gge,"Mean") #COMMENT: VIEW TOP 6 ROWS OF DATA head(gge) #COMMENT: GGEBIPLOT ANALYSIS GGEBiplot(Data = gge) 12 SASGxE- Genotype x Environment Interaction (GxE) Program Instructions and Output Interpretation Mahendra Dia, Todd C. Wehner and Consuelo Arellano Figure 10. Screenshot of ‘RStudio’ showing GGEBiplot package GGEBiplotGUI in ‘System Library’ window and installing GGEBiplot package GGEBiplotGUI from CRAN repository . Figure 11. Screenshot of ‘Model Selection’ which pops-up after executing GGEBiplotGUI model in ‘RStudio’. The singular value partition (SVP) attribute of ‘model selection’ window have dropdown values of 1 and 2 (Figure 11). The ‘SVP = 1’ means singular values are partitioned into genotype scores and, thus, it enhances the suitability of the biplot for comparing genotypes. The ‘SVP = 2’ means singular 13 SASGxE- Genotype x Environment Interaction (GxE) Program Instructions and Output Interpretation Mahendra Dia, Todd C. Wehner and Consuelo Arellano values are partitioned into environment scores and it enhances the suitability of the biplot for visualizing the relationships among environments (Yan et al. 2007). Similarly, ‘Centered By’ attribute of ‘model selection’ window have dropdown value of 0, 1, 2, and 3 (Figure 11). The ‘Centered By = 0’ means original data is used for visualization, and is effective for dataset whose grand mean is close to 0. It is useful in ‘QTL-by-environment’ interaction and ‘genotype covariate-by-environment’ interaction studies. The ‘Centered By = 1’ means grand mean is centered and useful when both row (genotype) and column (environment) main effects are of interest. The ‘Centered By = 2’ means the data are environment centered. The ‘Centered By = 3’ means data is double-centered and is desirable if GE is of sole interest and gene expression studies. The dropdown values of ‘Scaled By’ attribute of ‘model selection’ window include 0, 1, 2, and 3 (Figure 11). The purpose of scaling is to put the variables in comparable ranges by dividing with different parameters. Scaling is necessary if the variables are of different units. When ‘Scaled By = 0’ means the data are not divided by anything. The ‘Scaled By = 1’ means the data are rescaled by the withinenvironment standard deviation and assumes all environments to be equally important. The ‘Scaled By = 2’ means the data are rescaled by within-environment standard errors. It removes any heterogeneity among environments with regard to their experimental errors while retaining the information about environment’s discriminating ability (Yan et al., 2007). The ‘Scaled By = 3’ means data are rescaled by the environment means. It removes the differences in unit and data range among variables while retaining the discriminating ability of the environments (Yan et al., 2007). Stability Statistics Output Interpretation: Linear regression coefficient (bi) Regression coefficient (bi) of genotypes approximating unity (P<0.01) along with high trait mean is considered to stable across wide range of environment. When this is associated with low trait mean performance, genotypes are poorly adapted to all environments. A bi greater than unity describes genotypes with higher sensitivity to environmental change (below average stability), and greater specificity of adaptability to high yielding environments. Deviation from regression (S2d) Genotype is considered to be stable when deviation from regression (S2d) is not significantly different from zero. Perkins and Jinks (βi) Genotypes with slope βi values not significantly different from 0.0 were judged to be stable, whereas those with significant βi values were unstable. Shukla stability variance (σi2), Shukla’s squared hat (ŝi2) and Wricke's ecovalence (Wi2) Genotype with low Shukla stability variance (σi2), Shukla’s squared hat (ŝi2) and Wricke's ecovalence (Wi2) is regarded as stable. Kang stability statistic (YSi) According to this method, genotypes with YSi greater than the mean YSi are considered stable. 14 SASGxE- Genotype x Environment Interaction (GxE) Program Instructions and Output Interpretation Mahendra Dia, Todd C. Wehner and Consuelo Arellano AMMI Biplot (Trait vs. PC1) The biplot abscissa and ordinate shows the trait main effect and first principal component (PC1) term, respectively. The horizontal and vertical lines that divide the graph into four quadrants indicate the interaction score of zero and trait gran mean, respectively. Displacement along the vertical axis indicates the interaction difference between genotypes and environments. Similarly, displacement along horizontal axis indicates the difference in genotype and environment main effect. The genotypes with PC1 scores close to zero indicate general adaptation across the environments, whereas with larger PC1 scores indicate specific adaptation of genotypes to environment which has same PC1 scores and sign. The relative magnitude and direction of genotypes along the abscissa and ordinate axis in biplot explain the response pattern of genotypes across the environments. The genotype with PC1 score close to zero and high trait mean is considered to be stable. The environment with PC1 scores close to zero indicates smaller variation in interaction and relative ranking of genotypes are stable at these locations. This biplot is also known as AMMI1 as it considers one PC into account. It is commonly used for genotypic evaluation. Biplot (PC1 vs. PC2) The biplot abscissa and ordinate showed the first and second multiplicative axis term (PC1 and PC2), respectively. Horizontal and vertical lines passing from the origin (0, 0) of biplot divide the biplot into four quadrants. The distances from the origin of biplot indicate the amount of interaction exhibited by genotypes either over environment or environment over genotypes. The angle between genotype and environment vectors determined the nature of interaction; that is, it is positive for acute angles, negligible for right angles, and negative for obtuse angle. The angle formed by the vectors two environments provided an estimate of their correlation. Similarly, the small angles between genotype vectors inside same quadrant are similar in genetic performance. Connecting extreme genotype on biplot forms a polygon and the perpendicular to the side of polygon form sectors of genotype and environment. The genotypes at vertex of polygon are the winners in the environments included in that sector. When location falls in same sector across years then location is considered as separate mega-environment for genotype evaluation and recommendation. Location close to biplot origin is less interactive location and considered to be location good for selection of genotypes with average adaptation. This biplot is also known as AMMI2 as it considers two PC into account and it is used for mega-environment evaluation. GGE Biplot ‘Which-won-where’ or polygon view The ‘which-won-where’ or polygon view of the GGE biplot is an effective visual tool in megaenvironment analysis. The perpendicular lines to the polygon sides divide the biplot into sector. If environments fall into different sectors, this suggests that different genotype won in different sector and thus genotype x environment interaction or crossover pattern exist. The winning genotype for a sector is the vertex genotype. Conversely, if all environments fall into a single sector, this indicates that a single genotype had the highest yield in all environments. Dividing target environment or location into megaenvironment is recommended if crossover patterns are repeatable across year. Locations fall within one sector is considered as one mega-environment. Mega-environment is set of location that consistently share the best set of genotypes across years. Mean vs. stability view The ‘mean vs. stability’ view of GGE biplot facilitates genotype comparisons based on mean performance and stability across environments within a mega-environment. An ‘ideal’ genotype should have both high mean and high stability performance within a mega-environment. An ideal genotype is represented by the head of arrow on AEC abscissa (horizontal axis). The arrow shown on the AEC abscissa points in the direction of higher mean performance of genotypes and, consequently ranks the 15 SASGxE- Genotype x Environment Interaction (GxE) Program Instructions and Output Interpretation Mahendra Dia, Todd C. Wehner and Consuelo Arellano genotypes with respect to mean performance. The most stable genotype is located almost on the AEC abscissa (horizontal axis) and had a near-zero projections onto the AEC ordinate (vertical axis). It means, the rank of this genotype was highly consistent across the environment. Discriminative vs. representativeness view The ‘discriminating power vs. representative’ view of GGE biplot identifies test environments that effectively identify superior genotypes for mega-environment. An ‘ideal’ test environment is a virtual environment that has the most ability to discriminate the genotype and represent the mega-environment. An ideal test environment is represented by the head of arrow on AEC abscissa (horizontal axis). Test environment with longest vectors from biplot origin are more discriminating of the genotypes. If the test environment has a very short vector or is close to biplot origin, it means genotypes performed similar in it. A short vector could also mean that environment is not represented well by PC1 and PC2 if the biplot does not explain most of the data (Yan et al., 2007). Similarly, the test environment that has small angle on AEC abscissa is more representative of the mega-environment than those that have larger angles with it. References: CRAN. 2014. The comprehensive R archive network [Online]. Accessed at http://cran.rproject.org/web/packages/available_packages_by_name.html#available-packages-A (accessed 30 September 2014; verified 30 October 2014). Yan, W., M.S. Kang, B. Ma, S. Woods, and P.L. Cornelius. 2007. GGE biplot vs. AMMI analysis of genotype-by-environment data. Crop Science 47:643-653.SAS. 2009. SAS 9.2 macro language reference [Online]. Accessed at http://support.sas.com/documentation/cdl/en/mcrolref/61885/PDF/default/mcrolref.pdf (accessed 30 September 2014; verified 30 October 2014). SAS. 2014. SAS: Business analytics and business intelligence software [Online]. Accessed at http://www.sas.com/en_us/home.html (accessed 30 September 2014; verified 30 October 2014). 16

SASGxE Instructions (MS Word)

Related documents

Products

Support

SASGxE Instructions (MS Word)

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib