SASGxE Instructions (MS Word)

advertisement
SASGxE- Genotype x Environment Interaction (GxE) Program Instructions and Output
Interpretation
Mahendra Dia, Todd C. Wehner and Consuelo Arellano
SASGxE Objective:
SASGxE is a user friendly, annotated, flexible and efficient SAS program that will allow user to
analyze stability statistics of multiple dependent variables, simultaneously, of multi-location trial data.
This program is intended for SAS-PC (version 9.3 or higher) under the Microsoft Windows operating
system. SASGxE computes univariate stability statistics, input files that are ready to use in existing R
packages for multivariate stability statistics, ANOVA, descriptive statistics and the correlation of stability
analysis methods.
SASGxE program, input sample data, output from sample data (including AMMI and GGE
biplots from R statistical software), SAS log, and input data file template are freely available at
http://cuke.hort.ncsu.edu/cucurbit/wehner/software.html.
SASGxE Functionality:
1.
2.
3.
4.
Options statement.
Import input data.
Compute sum of longitudinal variables.
Define macros.
 UNIVARIATE1
 UNIVARIATE2
 UNIVARIATE3
 LEVELOFSIG
 OUTPUTEXCEL
 GENOTYPE
 ENVIRONMENT
 OUTPUTCSV
5. Creating a macro variable during DATA step execution.
 Total number and kind of dependent variable
6. Compute stability statistics of all dependent variable, simultaneously, using macro STABILITY.
 Quality checking (QC)
 Compute means, sum and coefficient of variation (CV)
 Analysis of variance (ANOVA)
 Stability statistics - univariate
 Spearman rank correlation
 Output
o Univariate and Multivariate stability statistics
o Spearman Rank correlation
o Mean, Sum and CV
7. Multivariate stability statistics.
 AMMI and GGEBiplot
SASGxE Program Description:
1. Options statement
We decided to run SASGxE with certain options turned on or off so that program run efficiently
and is easy to debug. OPTIONS, such as MPRINT, SYMBOLGEN and MLOGIC, are very helpful at times
1
SASGxE- Genotype x Environment Interaction (GxE) Program Instructions and Output
Interpretation
Mahendra Dia, Todd C. Wehner and Consuelo Arellano
for debugging. However, MLOGIC and SYMBOLGEN are turned off in production macro to improve
program efficiency (SAS, 2009). Similarly, OPTIONS LABEL was turned off so that auto-label can be
prevented (e.g. in PROC MEANS) so user will not be confused. Since, SASGxE is not intended to print
output in SAS Output window therefore OPTIONS DATE and NUMBER are turned off. OPTIONS
MPRINT is turned on so that user can view the text generated by macro execution in SAS Log window.
2. Import input data
SASGxE starts with user entered fields. User is required to feed input data file location, name,
sheet name, and sum of squares at %LET IPATH, %LET INAME, %LET ISHEETNAME1, %LET
SUMOFSQR statement, respectively. The value for Type I Sum of Squares and Type III Sum of Squares
are 1 and 2, respectively. SASGxE requires input data file in Excel (type equals ‘.xlsx’ only). Highlighted
fields are user entered in below code. The user input records are created into macro variables using %LET
statement.
%LET
%LET
%LET
%LET
IPATH =…………………; /*INPUT FILE PATH*/
INAME =…………………; /*INPUT FILE NAME*/
ISHEETNAME1 =…………………; /*INPUT FILE SHEET NAME*/
SUMOFSQR =…………………; /*1= TYPE 1 SS ; 2 = TYPE 3 SS*/
Input data file name, type, and location can be retrieved by right-clicking on input data file and
selecting ‘Properties’. Once in ‘Properties’ window, the user can see input data file details under
‘General’ tab. These details include file name , file type , and file location (Figure 1). Similarly,
input data sheet name can be found on left side of bottom bar on Excel file.
Figure 1. Screenshot of ‘Properties’ window of input Excel data file (Panel A) and example Excel file
(Panel B) represent file name , file type , file location and sheet name .
2
SASGxE- Genotype x Environment Interaction (GxE) Program Instructions and Output
Interpretation
Mahendra Dia, Todd C. Wehner and Consuelo Arellano
Input file location can also be viewed by clicking on file path or address bar of folder where
input data is stored (Figure 2).
Figure 2. Screenshot of ‘folder’ where input Excel data file exist. File location
indicated by an arrow mark.
is enclosed in box and
In SAS, the input data file is required to have missing records represented by dot (‘.’) and not to
have blank first row. We have also provided an input data file template, which is freely available at
http://cuke.hort.ncsu.edu/cucurbit/wehner/software.html. Input data file template is comprised of column
names including YR (year), LC (location), RP (replication), CL (cultigen or genotype), and Dependent
Variables 1 to n (Figure 3). Hereafter, ‘genotype’ is used to indicate cultigen, cultivar, variety or
genotype. SASGxE is not sensitive with column positions. However, SASGxE requires the user not to
change the column names that are indicated in ‘bold’ and ‘capital case’. Dependent variables are indicated
in ‘proper case’ and ‘non bold’, and user is allowed to store multiple dependent variables. SASGxE is
capable of analyzing multiple dependent variables simultaneously. Since SAS programing language
requires dataset and variable name less than 32 characters long therefore it recommended storing short
names for dependent variable and input dataset (< 20 character is desired).
Figure 3. Screenshot of input data template. Required variable names for year (YR), location (LC),
replication (RP), and cultigen or genotype (GN) are represented in ‘bold’, ‘capital case’ and enclosed in
separate box.
3
SASGxE- Genotype x Environment Interaction (GxE) Program Instructions and Output
Interpretation
Mahendra Dia, Todd C. Wehner and Consuelo Arellano
SASGxE imports input data using PROC
IMPORT. Input sample data
(SASGxE_PROG_INPUT_DATA.XLSX) consists of 3 years, 5 location, 2 replication, 10 genotypes and
2 dependent variables (Figure 4). Two dependent variables are marketable fruit weight (MKWT) and cull
fruit weight (CLWT). Dependent variables are recorded at 4 different times on same individual within a
plot (MKWT1-4; CLWT1-4) and unit is ‘pounds/plot’. Missing values are represented by dot (‘.’).
Figure 4. Screenshot of input sample data template consists of year (YR), location (LC), replication (RP),
genotype (CL), and dependent variables (MKWT1-4; CLWT1-4) columns and top 25 rows.
3. Compute sum of longitudinal variables
After importing data, SASGxE computes sum of longitudinal data (SUM function), rename
genotype names (IF-ELSE-THEN statement), and drops those dependent variable on which stability
statistics is not required (DROP statement). A dataset is longitudinal if it records the same type of
information on the same individuals over a time period. In below program, dropped variables are
highlighted and dependent variables values are converted from pounds/plot (lbs/plot) to
megagram/hectare (Mg/ha).
DATA TEMPA1 (RENAME=(CLT=CL));
SET TEMPA1;
MKWT=SUM(MKWT1,MKWT2,MKWT3,MKWT4); /*SUM ACROSS THE DEPENDENT VARIABLES*/
CLWT=SUM(CLWT1,CLWT2,CLWT3,CLWT4); /*SUM ACROSS THE DEPENDENT VARIABLES*/
MKMGHA=MKWT*0.40751; /*CALCULATE YIELD MG/HA FOR 12 FT PLOT SIZE*/
CLMGHA=CLWT*0.40751; /*FACTOR 0.40751 CONVERTS LBS/PLOT TO MG/HA*/
IF CL=01 THEN CLT='Mountain Hoosier
ELSE IF CL=02 THEN CLT='Hopi Red Flesh
ELSE IF CL=03 THEN CLT='Early Arizona
';
';
';
4
SASGxE- Genotype x Environment Interaction (GxE) Program Instructions and Output
Interpretation
Mahendra Dia, Todd C. Wehner and Consuelo Arellano
ELSE
ELSE
ELSE
ELSE
ELSE
ELSE
ELSE
IF
IF
IF
IF
IF
IF
IF
CL=04
CL=05
CL=06
CL=07
CL=08
CL=09
CL=10
THEN
THEN
THEN
THEN
THEN
THEN
THEN
CLT='Starbrite F1
';
CLT='Stone Mountain
';
CLT='Stars-N-Stripes F1';
CLT='AU-Jubilant
';
CLT='Calhoun Gray
';
CLT='Big Crimson
';
CLT='Legacy F1
';
DROP MKWT1 MKWT2 MKWT3 MKWT4 CLWT1 CLWT2 CLWT3 CLWT4 MKWT CLWT CL;
RUN;
Except YR, LC, RP and CL, SASGxE treats other column as a dependent variable and computes
stability statistics on each of them. Therefore it is suggested to drop dependent variables in this step if
stability statistics is not needed on them. This dataset step is an optional. User can remove or skip this
step from program if dependent variable in input data does not required to be recreated. The SAS GEI
code will not break by removing this step because dataset file names are same (TEMPA1) as previous step
file name.
4. Defining macro
Macro UNIVARIATE1 computes regression slope (bi), standard error of slope, deviation from
regression (S2d), T-test and F-test on regression slope (H0: bi = 1) and deviation from regression (H0: S2d =
0), and level of significance using PROC GLM. Results of these statistics are captured in
SLOPE&DEPVAR.SAS file, where &DEPVAR is a dependent variable name (Figure 5). The ‘.SAS’ file
can be located from following path: SAS program window  Explorer panel  Explorer tab  Libraries
 Work library. The level of significance of T-test and F-test at 0.05, 0.01 and 0.001 is represented by
‘*’, ‘**’, ‘***’; respectively.
Figure 5. Screenshot of SLOPE&DEPVAR.SAS file showing analysis of regression slope (bi), standard
error of slope, deviation from regression (S2d), T-test and F-test on regression slope (H0: bi = 1) and
deviation from regression (H0: S2d = 0) and level of significance of T-test and F-test.
Macro UNIVARIATE2 computes Wricke’s ecovalence (Wi), Shukla’s stability variance (σi2),
Shukla’s squared hat (ŝi2), and Perkins and Jinks beta (βi) using PROC IML. Results of these statistics are
captured in UNIVARIATE2&DEPVAR.SAS file (SAS window  Explorer panel  Explorer tab 
Libraries  Work library), where &DEPVAR is a dependent variable name (Figure 6).
5
SASGxE- Genotype x Environment Interaction (GxE) Program Instructions and Output
Interpretation
Mahendra Dia, Todd C. Wehner and Consuelo Arellano
Figure 6. Screenshot of UNIVARIATE2&DEPVAR.SAS file showing analysis of Wricke’s ecovalence
(Wi), Shukla’s stability variance (σi2), Shukla’s squared hat (ŝi2), and Perkins and Jinks beta (βi).
The macro UNIVARIATE3 computes least square (LS) means, standard error of LS means, least
significant difference (LSD) of mean, and Kang’s Yield-Stability statistics (YSi). Results of macro
UNIVARIATE3 are captured in UNIVARIATE3&DEPVAR.SAS file (SAS window  Explorer panel
 Results tab  Libraries  Work library), where &DEPVAR is a dependent variable name (Figure 7).
LSD is used to compare trait means across genotypes.
Figure 7. Screenshot of UNIVARIATE3&DEPVAR.SAS file showing analysis of least square (LS)
means, standard error of LS means, least significant difference (LSD) of mean, and Kang’s YieldStability statistics (YSi).
Macro LEVELOFSIG concatenates Spearman correlation value with level of significance. The
level of significance at 0.05, 0.01 and 0.001 is represented by ‘*’, ‘**’, ‘***’; respectively.
Macro OUTPUTEXCEL exports output files in Excel files (.xlsx only) and to same folder or
location where input data file is placed. Similarly, macro OUTPUTCSV exports output files as comma
separated value (CSV) files and to same folder or location where input data file is placed. These CSV
files are loaded into ‘RStudio’ to analyze multivariate stability statistics using AMMI and GGE Biplot
models. Macro GENOTYPE, ENVIRONMENT and LOCATION generate shorter name for
‘genotypes/cultivars’, ‘environment’ and ‘location’, respectively, so that visualization of AMMI and
GGEBiplot output is legible.
5. Creating a macro variable during DATA step execution
6
SASGxE- Genotype x Environment Interaction (GxE) Program Instructions and Output
Interpretation
Mahendra Dia, Todd C. Wehner and Consuelo Arellano
SASGxE creates macro variables for total number of (&LAST_DEPVARIABLE) and each
dependent variable (&DEPVARX) using SYMPUT routine in a DATA step. To improve program
efficiency DATA _NULL_ statement was used.
DATA _NULL_;
SET START1 END=END_OF_DATASET;
CALL SYMPUT ('DEPVARX'||TRIM(LEFT(_N_)), NAME); /*MACRO FOR
DEPENDENT VARIABLE*/
IF END_OF_DATASET THEN CALL SYMPUT ('LAST_DEPVARIABLE',
COMPRESS(_N_));
RUN;
6. Compute stability statistics of all dependent variable, simultaneously, using macro STABILITY
Macro STABILITY computes different stability statistics for multiple dependent variables using
iterative %DO and %END statements. During each iteration one dependent variable is analyzed. The %DO
loop stops processing after stop value is equal to &LAST_DEPVARIABLE.
Input data is quality checked for missing records and environment is defined in DATA TEMPA2
statement. SASGxE removes rows having missing records for location, year, replication or dependent
variable. Environment (EN) is a combination of year and location. Descriptive statistics including means,
sum, and coefficient of variance (CV) are computed using PROC MEANS and PROC SQL. Using PROC
TRANSPOSE results of descriptive statistics are transposed in user friendly layout so that researchers can
interpret them easily. SASGxE generates following descriptive statistics.
 Genotype mean over environment and genotype (MEAN&DEPVAR.SAS),
 Genotype sum over environment and genotype (SUM&DEPVAR.SAS),
 Genotype mean over environment and replication (ENV&DEPVAR.SAS),
 Genotype mean over year, location, and replication (M_&DEPVAR_CYLR.XLSX),
 Genotype mean over year and location (M_&DEPVAR_CYL.XLSX),
 Genotype mean over year (M_&DEPVAR_CY.XLSX),
 Genotype mean over location (M_&DEPVAR_CL.XLSX),
 Genotype CV over location (CV_&DEPVAR_CL.XLSX),
 Genotype mean (M_&DEPVAR_C.XLSX),
 Location mean (M_&DEPVAR_L.XLSX),
 Location mean over year (M_&DEPVAR_LY.XLSX),
 Genotype mean over environment (M_&DEPVAR_CE.XLSX),
 Genotype mean over location and replication (M_&DEPVAR_CLR.XLSX), and
 Genotype mean over environment and replication (M_&DEPVAR_CER.XLSX).
The macro STABILITY automatically export ‘.xlsx’ output file of descriptive statistics to same
location/folder
where input data file is placed. User can view ‘.SAS’ output file of descriptive
statistics at following location: SAS program window  Explorer panel  Explorer tab  Libraries 
Work library.
Analysis of variance (ANOVA) is computed using PROC GLM to determine the size and significance
of genotype x environment interaction (GxE) of dependent variable. SASGxE considers genotype, year,
location and replications as random effects. Therefore, all the factors are tested against an error term. An
F-test is used to test the significance of each factor. The level of significance of F-test at 0.05, 0.01 and
7
SASGxE- Genotype x Environment Interaction (GxE) Program Instructions and Output
Interpretation
Mahendra Dia, Todd C. Wehner and Consuelo Arellano
0.001 is represented by ‘*’, ‘**’, ‘***’; respectively. SASGxE computes both Type I and III Sums of
Squares (Type I SS and Type II SS). However, ‘ANOVA_&DEPVAR.XLSX’ output file reports either
‘Type I SS’ or ‘Type III SS’ that user request in the beginning of program at %LET SUMOFSQR
statement.
The macro STABILITY calls macro UNIVARIATE1, UNIVARIATE2 and UNIVARIATE3 to
compute univariate stability statistics. These univariate statistics include regression slope (bi); standard
error of slope; deviation from regression (S2d); T-test on regression slope (H0: bi = 1); F-test deviation
from regression (H0: S2d = 0); Wricke’s ecovalence (Wi); Shukla’s stability variance (σi2); Shukla’s
squared hat (ŝi2); Perkins and Jinks beta (βi); least square (LS) means; standard error of LS means; least
significant difference (LSD) of mean; and Kang’s Yield-Stability statistics (YSi). The major reasons for
defining macro UNIVARIATE1, UNIVARIATE2 and UNIVARIATE3 outside of macro STABILITY
so that nested macro is avoided and thus program is efficient and easy to debug.
SASGxE assigns ranks to genotypes for each stability parameter. Spearman’s rank correlation is
computed using PROC CORR on the ranks to measure the relationship between stability parameters.
Genotypes are ranked in increasing order for decreased value of a dependent variable. However, for
certain dependent variables such as disease, % cull fruits, etc.; where lower trait value is considered to be
good. User is required to assign higher ordinal value to lower trait value of such dependent variable.
Otherwise, Spearman’s rank correlation will give wrong output and mislead the user. Genotypes are
ranked in increasing order for increased value of deviation from regression (S2d); Wricke’s ecovalence
(Wi); Shukla’s stability variance (σi2); Shukla’s squared hat (ŝi2); and Kang’s Yield-Stability statistics
(YSi). Regression slope (bi) approximating unity is considered to be stable, therefore genotypes are ranked
in increasing order when bi > 1 and decreasing order when bi < 1. Similarly, Perkins and Jinks beta (βi) is
stable near zero, therefore genotypes are ranked in increasing order when βi > 0 and decreasing order
when βi < 0. The level of significance of correlation at 0.05, 0.01 and 0.001 is represented by ‘*’, ‘**’,
‘***’; respectively.
The macro STABILITY invoked macro OUTPUTEXCEL and OUTPUTCSV to generate output files
in ‘.xlsx’ and ‘.csv’ formats, respectively. These output files are auto sent to same location/folder
where input data file is placed. Following are the output files generated by macro OUTPUTEXCEL.
 Genotype mean over year, location, and replication (M_&DEPVAR_CYLR.XLSX),
 Genotype mean over year and location (M_&DEPVAR_CYL.XLSX),
 Genotype mean over year (M_&DEPVAR_CY.XLSX),
 Genotype mean over location (M_&DEPVAR_CL.XLSX),
 Genotype CV over location (CV_&DEPVAR_CL.XLSX),
 Genotype mean (M_&DEPVAR_C.XLSX),
 Location mean (M_&DEPVAR_L.XLSX),
 Location mean over year (M_&DEPVAR_LY.XLSX),
 Genotype mean over environment (M_&DEPVAR_CE.XLSX),
 Genotype mean over environment and replication (M_&DEPVAR_CER.XLSX),
 Genotype mean over location and replication (M_&DEPVAR_CLR.XLSX).
 Analysis of variance (ANOVA_&DEPVAR.XLSX),
 Univariate stability statistics (STAB_&DEPVAR.XLSX),
 Spearman’s rank correlation (ANOVA_&DEPVAR.XLSX),
 Legend for location used in AMMI (LOC_LEGEND_&DEPVAR.XLSX),
8
SASGxE- Genotype x Environment Interaction (GxE) Program Instructions and Output
Interpretation
Mahendra Dia, Todd C. Wehner and Consuelo Arellano


Legend for genotype used in AMMI and GGEBiplot (GEN_LEGEND_&DEPVAR.XLSX), and
Legend for environment used in AMMI and GGEBiplot (ENV_LEGEND_&DEPVAR.XLSX).
Following are the output files generated by macro OUTPUTCSV.
 Input file for ‘RStudio’ software for GGEBiplot (Genotype x Environment) analysis
(BIPLOT_&DEPVAR_.CSV),
 Input file for ‘RStudio’ software for GGEBiplot (Genotype x Location) analysis
(BIPLOT2_&DEPVAR_.CSV),
 Input file for ‘RStudio’ software for AMMI (Genotype x Environment) analysis
(AMMI1_&DEPVAR_.CSV), and
 Input file for ‘RStudio’ software for AMMI (Genotype x Location) analysis
(AMMI2_&DEPVAR_.CSV).
7. Multivariate statistics
SASGxE does not compute multivariate statistics (AMMI and GGEBiplot) for stability analysis
per
se.
However,
files
(‘BIPLOT_&DEPVAR_.CSV’,
BIPLOT2_&DEPVAR_.CSV
‘AMMI1_&DEPVAR_.CSV’, and ‘AMMI2_&DEPVAR_.CSV’) generated by SASGxE can be loaded into
‘RStudio’ software to compute multivariate stability statistics (RStudio, 2014). These files are ready to go
in ‘RStudio’. However, the user needs to be cautious with case sensitivity of ‘R’ computing language.
‘RStudio’ is an integrated tool designed to help user more productive with ‘R’ computing software and it
requires ‘R’ version 2.11.1 or higher. To improve the visuals of AMMI and GGEBiplot analysis,
genotypes, locations and environment are abbreviated to G1-Gn, LOC1-LOCn, and ENV1-ENVn,
respectively, where ‘n’ is a total number of entity. User can view the respective abbreviation for a
corresponding genotype, location and environment in ‘GEN_LEGEND_&DEPVAR.XLSX’,
‘LOC_LEGEND_&DEPVAR.XLSX’, and ‘ENV_LEGEND_&DEPVAR.XLSX’ files, respectively.
In order to analyze stability using AMMI model user need to select Agricolae
package in
system library window of ‘RStudio’ software (Figure 8). If Agricolae package is not found in system
library then user can install it from CRAN repository (CRAN, 2014) (Figure 8). Then reference the
path of folder where input data is located from ‘Session’ in ‘window tool bar’ (‘session’ in window tool
bar  select ‘set work directory’  select ‘choose work directory’  select folder where data is kept)
(Figure 9). User can also reference path in code in ‘Console’ or ‘R Script’ window. However, the user
needs to be cautious with the requirement of forward slash in reference path in ‘R’ computing language.
User can use below code in ‘Console’ or ‘R Script’ window to analyze AMMI model. The output files
‘AMMI1_&DEPVAR_.CSV’, and ‘AMMI2_&DEPVAR_.CSV’ generate genotype x environment and
genotype x location analysis results, respectively.
# COMMENT: USER NEEDS TO REPLACE INPUT DATA FILE PATH
setwd("E:/PhD Research Work/PhD Articles/Articles for Publication/GxE
SAS Prog/Sample Data")
#COMMENT: USER NEEDS TO REPLACE FILE NAME (AMMI2_MKMGHA). IT IS A CASE
SENSITIVE
Data = read.csv(file="AMMI2_MKMGHA.csv", header = TRUE)
#COMMENT: VIEW TOP 6 ROWS OF DATA
head(Data)
attach(Data)
9
SASGxE- Genotype x Environment Interaction (GxE) Program Instructions and Output
Interpretation
Mahendra Dia, Todd C. Wehner and Consuelo Arellano
#COMMENT: USER NEEDS TO REPLACE DEPEDENT VARIABLE NAME (MKMGHA). #IT IS
A CASE SENSITIVE
model<- AMMI(Locality, Genotype, Rep, MKMGHA, console=FALSE)
model$ANOVA
#COMMENT: VIEW MODEL OUPUT
print(model)
detach(Data)
# COMMENT: BIPLOT
plot(model)
# COMMENT: TRIPLOT PC 1,2,3
plot(model, type=2, number=TRUE)
# COMMENT: BIPLOT PC1 vs DEPENDENT VARIABLE
plot(model, first=0,second=1, number=TRUE)
Figure 8. Screenshot of ‘RStudio’ showing AMMI package Agricolae
and installing AMMI package Agricolae from CRAN repository .
in ‘System Library’ window
10
SASGxE- Genotype x Environment Interaction (GxE) Program Instructions and Output
Interpretation
Mahendra Dia, Todd C. Wehner and Consuelo Arellano
Figure 9. Screenshot of ‘RStudio’ showing referencing a path of folder where input data is located from
‘Session’ in ‘window tool bar’ (‘session’ in window tool bar  select ‘set work directory’  select
‘choose work directory’  select folder where data is kept) .
Similarly, user needs to select GGEBiplotGUI
package in system library window of
‘RStudio’ software to compute GGEBiplot model (Figure 10). If GGEBiplotGUI package is not found
in system library then user can install it from CRAN repository
(CRAN, 2014) (Figure 10). Then
reference the path of folder where input data is located from ‘Session’ in ‘window tool bar’ (‘session’ in
window tool bar  select ‘set work directory’  select ‘choose work directory’  select folder where
data is kept)
(Figure 9). User can also reference path in code in ‘Console’ or ‘R Script’ window.
However, the user needs to be cautious with the requirement of forward slash in reference path in ‘R’
computing language, which is opposite of SAS. The GGEBiplotGUI package accepts input data where
rows are labelled and no blank [NA] records. Therefore, system defined function rownames was used to
label the rows. Similarly, user defined function na_check was used to replace blank records by trait
‘mean’ of genotype across locations [gge=na_check(gge,”Mean”)] or by ‘zero’
[gge=na_check(gge,”Zero”)]. The user has option to choose either ‘mean’ or ‘zero’ to replace
blank [NA] record. If input data does not have missing records then program process the data per se.
User can use below code in ‘Console’ or ‘R Script’ window to analyze GGEBiplot. Upon executing the
code ‘Model Selection’ window opens and user is required to populate the dropdowns in ‘Model
Selection’ window to generate appropriate biplots for mega-environment analysis, genotype evaluation,
and test-environment evaluation (Yan et al. 2007). The output files ‘BIPLOT_&DEPVAR_.CSV’, and
‘BIPLOT2_&DEPVAR_.CSV’ generate genotype x environment and genotype x location analysis results,
respectively. The detailed description and difference between AMMI and GGEBiplot model was
11
SASGxE- Genotype x Environment Interaction (GxE) Program Instructions and Output
Interpretation
Mahendra Dia, Todd C. Wehner and Consuelo Arellano
presented by Yan et al. (2007). Output of univariate and multivariate statistics of sample data can be
found at http://cuke.hort.ncsu.edu/cucurbit/wehner/software.html.
# COMMENT: USER NEEDS TO REPLACE INPUT DATA FILE PATH
setwd("E:/PhD Research Work/PhD Articles/Articles for Publication/GxE
SAS Prog/Sample Data")
#COMMENT: USER NEED TO REPLACE FILE NAME. IT IS A CASE SENSITIVE
gge = read.csv(file="BIPLOT2_MKMGHA.csv", header = TRUE)
#COMMENT: VIEW TOP 6 ROWS OF DATA
head(gge)
#COMMENT: colnames() gives you column labels
#COMMENT: rownames() gives the row labels
rownames(gge) = gge[,1]
gge = gge[,-1]
#COMMENT: VIEW TOP 6 ROWS OF DATA
head(gge)
#Make a function to find all the NA (Blank) values and replace with
either row_mean or zero
na_check = function(dat,check)
{
for(i in 1:nrow(dat))
{
for(h in 1:ncol(dat))
{
if (is.na(dat[i,h])==T)
{
if (check=="Mean")
{
dat[i,h]=mean(na.omit(as.numeric(dat[i,])))
{
if(check=="Zero")
{
dat[i,h]=0
}
}
}
}
}
}
return(dat)
}
#COMMENT: Replace blank record with mean or zero using user defined
function na_check
gge=na_check(gge,"Mean")
#COMMENT: VIEW TOP 6 ROWS OF DATA
head(gge)
#COMMENT: GGEBIPLOT ANALYSIS
GGEBiplot(Data = gge)
12
SASGxE- Genotype x Environment Interaction (GxE) Program Instructions and Output
Interpretation
Mahendra Dia, Todd C. Wehner and Consuelo Arellano
Figure 10. Screenshot of ‘RStudio’ showing GGEBiplot package GGEBiplotGUI
in ‘System
Library’ window and installing GGEBiplot package GGEBiplotGUI
from CRAN repository .
Figure 11. Screenshot of ‘Model Selection’ which pops-up after executing GGEBiplotGUI model in
‘RStudio’.
The singular value partition (SVP) attribute of ‘model selection’ window have dropdown values
of 1 and 2 (Figure 11). The ‘SVP = 1’ means singular values are partitioned into genotype scores and,
thus, it enhances the suitability of the biplot for comparing genotypes. The ‘SVP = 2’ means singular
13
SASGxE- Genotype x Environment Interaction (GxE) Program Instructions and Output
Interpretation
Mahendra Dia, Todd C. Wehner and Consuelo Arellano
values are partitioned into environment scores and it enhances the suitability of the biplot for visualizing
the relationships among environments (Yan et al. 2007).
Similarly, ‘Centered By’ attribute of ‘model selection’ window have dropdown value of 0, 1, 2,
and 3 (Figure 11). The ‘Centered By = 0’ means original data is used for visualization, and is effective for
dataset whose grand mean is close to 0. It is useful in ‘QTL-by-environment’ interaction and ‘genotype
covariate-by-environment’ interaction studies. The ‘Centered By = 1’ means grand mean is centered and
useful when both row (genotype) and column (environment) main effects are of interest. The ‘Centered
By = 2’ means the data are environment centered. The ‘Centered By = 3’ means data is double-centered
and is desirable if GE is of sole interest and gene expression studies.
The dropdown values of ‘Scaled By’ attribute of ‘model selection’ window include 0, 1, 2, and 3
(Figure 11). The purpose of scaling is to put the variables in comparable ranges by dividing with different
parameters. Scaling is necessary if the variables are of different units. When ‘Scaled By = 0’ means the
data are not divided by anything. The ‘Scaled By = 1’ means the data are rescaled by the withinenvironment standard deviation and assumes all environments to be equally important. The ‘Scaled By =
2’ means the data are rescaled by within-environment standard errors. It removes any heterogeneity
among environments with regard to their experimental errors while retaining the information about
environment’s discriminating ability (Yan et al., 2007). The ‘Scaled By = 3’ means data are rescaled by
the environment means. It removes the differences in unit and data range among variables while retaining
the discriminating ability of the environments (Yan et al., 2007).
Stability Statistics Output Interpretation:
Linear regression coefficient (bi)
Regression coefficient (bi) of genotypes approximating unity (P<0.01) along with high trait mean
is considered to stable across wide range of environment. When this is associated with low trait mean
performance, genotypes are poorly adapted to all environments. A bi greater than unity describes
genotypes with higher sensitivity to environmental change (below average stability), and greater
specificity of adaptability to high yielding environments.
Deviation from regression (S2d)
Genotype is considered to be stable when deviation from regression (S2d) is not significantly
different from zero.
Perkins and Jinks (βi)
Genotypes with slope βi values not significantly different from 0.0 were judged to be stable,
whereas those with significant βi values were unstable.
Shukla stability variance (σi2), Shukla’s squared hat (ŝi2) and Wricke's ecovalence (Wi2)
Genotype with low Shukla stability variance (σi2), Shukla’s squared hat (ŝi2) and Wricke's
ecovalence (Wi2) is regarded as stable.
Kang stability statistic (YSi)
According to this method, genotypes with YSi greater than the mean YSi are considered stable.
14
SASGxE- Genotype x Environment Interaction (GxE) Program Instructions and Output
Interpretation
Mahendra Dia, Todd C. Wehner and Consuelo Arellano
AMMI
Biplot (Trait vs. PC1)
The biplot abscissa and ordinate shows the trait main effect and first principal component (PC1)
term, respectively. The horizontal and vertical lines that divide the graph into four quadrants indicate the
interaction score of zero and trait gran mean, respectively. Displacement along the vertical axis indicates
the interaction difference between genotypes and environments. Similarly, displacement along horizontal
axis indicates the difference in genotype and environment main effect. The genotypes with PC1 scores
close to zero indicate general adaptation across the environments, whereas with larger PC1 scores indicate
specific adaptation of genotypes to environment which has same PC1 scores and sign. The relative
magnitude and direction of genotypes along the abscissa and ordinate axis in biplot explain the response
pattern of genotypes across the environments. The genotype with PC1 score close to zero and high trait
mean is considered to be stable. The environment with PC1 scores close to zero indicates smaller
variation in interaction and relative ranking of genotypes are stable at these locations. This biplot is also
known as AMMI1 as it considers one PC into account. It is commonly used for genotypic evaluation.
Biplot (PC1 vs. PC2)
The biplot abscissa and ordinate showed the first and second multiplicative axis term (PC1 and
PC2), respectively. Horizontal and vertical lines passing from the origin (0, 0) of biplot divide the biplot
into four quadrants. The distances from the origin of biplot indicate the amount of interaction exhibited by
genotypes either over environment or environment over genotypes. The angle between genotype and
environment vectors determined the nature of interaction; that is, it is positive for acute angles, negligible
for right angles, and negative for obtuse angle. The angle formed by the vectors two environments
provided an estimate of their correlation. Similarly, the small angles between genotype vectors inside
same quadrant are similar in genetic performance. Connecting extreme genotype on biplot forms a
polygon and the perpendicular to the side of polygon form sectors of genotype and environment. The
genotypes at vertex of polygon are the winners in the environments included in that sector. When location
falls in same sector across years then location is considered as separate mega-environment for genotype
evaluation and recommendation. Location close to biplot origin is less interactive location and considered
to be location good for selection of genotypes with average adaptation. This biplot is also known as
AMMI2 as it considers two PC into account and it is used for mega-environment evaluation.
GGE Biplot
‘Which-won-where’ or polygon view
The ‘which-won-where’ or polygon view of the GGE biplot is an effective visual tool in megaenvironment analysis. The perpendicular lines to the polygon sides divide the biplot into sector. If
environments fall into different sectors, this suggests that different genotype won in different sector and
thus genotype x environment interaction or crossover pattern exist. The winning genotype for a sector is
the vertex genotype. Conversely, if all environments fall into a single sector, this indicates that a single
genotype had the highest yield in all environments. Dividing target environment or location into megaenvironment is recommended if crossover patterns are repeatable across year. Locations fall within one
sector is considered as one mega-environment. Mega-environment is set of location that consistently
share the best set of genotypes across years.
Mean vs. stability view
The ‘mean vs. stability’ view of GGE biplot facilitates genotype comparisons based on mean
performance and stability across environments within a mega-environment. An ‘ideal’ genotype should
have both high mean and high stability performance within a mega-environment. An ideal genotype is
represented by the head of arrow on AEC abscissa (horizontal axis). The arrow shown on the AEC
abscissa points in the direction of higher mean performance of genotypes and, consequently ranks the
15
SASGxE- Genotype x Environment Interaction (GxE) Program Instructions and Output
Interpretation
Mahendra Dia, Todd C. Wehner and Consuelo Arellano
genotypes with respect to mean performance. The most stable genotype is located almost on the AEC
abscissa (horizontal axis) and had a near-zero projections onto the AEC ordinate (vertical axis). It means,
the rank of this genotype was highly consistent across the environment.
Discriminative vs. representativeness view
The ‘discriminating power vs. representative’ view of GGE biplot identifies test environments
that effectively identify superior genotypes for mega-environment. An ‘ideal’ test environment is a virtual
environment that has the most ability to discriminate the genotype and represent the mega-environment.
An ideal test environment is represented by the head of arrow on AEC abscissa (horizontal axis). Test
environment with longest vectors from biplot origin are more discriminating of the genotypes. If the test
environment has a very short vector or is close to biplot origin, it means genotypes performed similar in
it. A short vector could also mean that environment is not represented well by PC1 and PC2 if the biplot
does not explain most of the data (Yan et al., 2007). Similarly, the test environment that has small angle
on AEC abscissa is more representative of the mega-environment than those that have larger angles with
it.
References:
CRAN. 2014. The comprehensive R archive network [Online].
Accessed at http://cran.rproject.org/web/packages/available_packages_by_name.html#available-packages-A (accessed
30 September 2014; verified 30 October 2014).
Yan, W., M.S. Kang, B. Ma, S. Woods, and P.L. Cornelius. 2007. GGE biplot vs. AMMI analysis of
genotype-by-environment data. Crop Science 47:643-653.SAS. 2009.
SAS
9.2
macro
language
reference
[Online].
Accessed
at
http://support.sas.com/documentation/cdl/en/mcrolref/61885/PDF/default/mcrolref.pdf (accessed
30 September 2014; verified 30 October 2014).
SAS. 2014. SAS: Business analytics and business intelligence software [Online]. Accessed at
http://www.sas.com/en_us/home.html (accessed 30 September 2014; verified 30 October 2014).
16
Download