Supplementary Text - MORPH-R documentation Ranking Candidate Genes for Membership in Pathways Version XXX Date XXX Authors Oren Tzfadia <Oren.Tzfadia@weizmann.ac.il> David Amar <Ddam.Am@gmail.com> Itziar Frades < Itziar.Frades@gmail.com> Erik Alexandersson < Erik.Alexandersson@slu.se > Ron Shamir <RShamir@tau.ac.il> Maintainers Dimitry Chazanov<Dimitry.Chazanov@weizmann.ac.il> Nati Ghatan <Nati.Ghatan@gmail.com> Zip data Yes License <LGPL> Description Functions designed to reveal unknown genes in biological pathways. URL http://biocourse.weizmann.ac.il/morph/ MORPH – R Implementation – User Manual 1 Table of Contents 1. Introduction 3 2. MORPH-R package content 4 3. Required Inputs & Valid Formats 4-9 3.1. The MORPH object 4-6 3.2. Using input files 6-9 3.2.1. Configs.txt 7-8 3.2.2. Pathway genes 8 3.2.3. Gene expression data 8-9 3.2.4. Clustering solution 9 4. Description of Functions 9-23 4.1. General Functions 9-11 4.1.1. getClusteringInformation 9-10 4.1.2. getPathwayGenes 10 4.1.3. getGeneExpression 10-11 4.2. MORPH Functions 11-23 4.2.1. prepareMorphObject 11-12 4.2.2. rankGenes 12-13 4.2.3. getNormalizedCorrelations 13-14 4.2.4. internal_LOOCV 15-16 4.2.5. AUSR 16-17 4.2.6. MORPH 17-18 4.2.7. 19 removeAbscentPathwayGenes 4.2.8. global_LOOCV 20 4.2.9. getMorphResultBestConfig 20-21 4.2.10. getMorphPredictions 21-22 4.2.11. getScoresDistributionPlots 22-23 MORPH – R Implementation – User Manual 2 1. The MORPH Algorithm - Introduction This package implements an algorithm called MORPH (for module-guided ranking of candidate pathway genes) for revealing unknown genes in biological pathways. The algorithm is explained in greater detail in the article[1] upon which this implementation is based. Briefly, this method receives as input a set of known genes from the target pathway, a collection of expression profiles, clustering solutions, and interaction and metabolic networks. Using machine learning techniques, MORPH selects the best combination of data and analysis method and outputs a ranking of candidate genes predicted to belong to the target pathway. The MORPH-R package presented here comprises the entire scripts and functions (in the form of R code) that are required in order to run the MORPH process. A user who wishes to apply the MORPH algorithm on her own data, is required first to construct the MORPH object from scratch (as described in section 4.2.1 -"prepare The MORPH Object") Reference 1. Tzfadia O, Amar D, Bradbury LM, Wurtzel ET and Shamir R. (2012). The MORPH algorithm: ranking candidate genes for membership in Arabidopsis and tomato pathways. Plant Cell. 11:4389-406. doi: 10.1105. MORPH – R Implementation – User Manual 3 2. MORPH-R package content The MORPH package comprises several files. First and foremost, it contains the MORPH.R file which contains all functions that are described in section 4 – "Description of Function". These functions are required in order to run the MORPH process. Therefore, any code that uses these functions will have to use this file as a source (for more information, type source in the R console). In addition, this package contains four datasets compressed in ZIP format: One for Arabidopsis thaliana (Data_AT.zip), one for Solanum lycopersicum (Data_SL.zip), and two for Solanum tuberosum (Data_StITAG.zip and Data_StPGSC.zip). The first two are the same datasets which were used in the reference article and the latter two are new additional data sets we added for the benefit of the potato research community. All the files within these folders are arranged in the valid input format for the algorithm, as described in section 3 - "Required Inputs & Valid Formats". Table S1. Default data sets included in current version of the MORPH-R pack Species Gene IDs Arabidopsis* Tomato* Potato ITAG** Potato PGSC** Example ID: AT5G17250 Solyc02g04650 Sotub01g031170 PGSC0003DMP400000015 Gene Expression Profiles (# data sets) 213 (4) 53 (4) 343 (4) 343(4) Clustering Solutions 8 6 6 6 * See Supplementary Table 1 in Tzfadia et al, 2012 for GE list of accession numbers ** See Supplementary Table X for potato GE list of accession numbers 3. Required Inputs & Valid Formats 3.1 The MORPH Object Originally, when the MORPH algorithm was published in 2012, it was designed to receive known pathway genes from the user and then rank them as potential members of this pathway in one of two organisms whose gene expression data and clustering solutions were available for use (either A.thaliana or S.lycopersicum). This initial setting notwithstanding, one may also use the functions included in this package by embedding MORPH – R Implementation – User Manual 4 them in one's own code and then apply them on one's own data, in virtually any model organism. In either case, the input to the MORPH must be organized in a certain structure which if referred to as "MORPH object". In R terms, the MORPH object belongs to the class "morph_data" and is essentially a list that comprises several different data types. First, let us examine the object declaration in R syntax: > morph_obj = list(config, clustering_solutions, ge_datasets, pathway_genes) > class(morph_obj) = "morph_data" Now let's examine the members of the list object: a) config – A data frame containing the configuration on which the algorithm will run. The first column holds the names of the gene expression datasets. The second column holds the names of their corresponding clustering solutions. The columns may contain duplicates, since it is possible to run the same gene expression dataset against two (or more) different clustering solutions. For example: V1 V2 1 DataSet1 ClusteringSolution1 2 DataSet1 ClusteringSolution2 3 DataSet1 ClusteringSolution3 4 DataSet2 ClusteringSolution1 5 DataSet3 ClusteringSolution4 b) clustering_solutions – This member is a list that contains all clustering solutions. The name of the each clustering solution should correspond to the appropriate name in the config field of the MORPH object. For example: > names(morph_obj$clustering_solutions) [1] "ClusteringSolution1" "ClusteringSolution2" "ClusteringSolution3" [4] "ClusteringSolution4" Each clustering solution is a data frame that contains the partition of genes into their parent clusters (or "modules") in a sort of hash structure. The row names MORPH – R Implementation – User Manual 5 are the names of the genes while the first and only column contains their parent cluster (which may be a number or a name). For example: MORPH – R Implementation – User Manual 6 ClusteringSolution3 AT1G01050 1 AT1G01090 1 AT1G01120 2 AT1G01140 2 Important notes: The clustering solution must not contain any genes that do not appear in the gene expression dataset. Every gene in the gene expression data must have a corresponding cluster. c) ge_datasets – This member is a list that contains all gene expression datasets. Its structure somewhat resembles that of the clustering_solutions member since the names of the members should also correspond to the appropriate names in the config field of the MORPH object. For example: > names(morph_obj$ge_datasets) [1] "DataSet1" "DataSet2" "DataSet3" Each gene expression dataset is a data frame, where the row names are the names of the genes, the column names are the names of the conditions (e.g. time points, stress conditions, tissues, etc.), and the cells contain the expression level in a linear scale, i.e. non-log (the example data available in this package are a linear expression scale from microarray experiments). For example: ID AT1G01010 AT1G01040 AT1G01050 AT1G01060 AT1G01070 A -0.499 -0.092 0.019 1.546 -0.932 B -0.499 -0.345 0.188 -0.071 1.581 C 2.213 -0.151 -0.175 -0.12 0.651 D -0.499 -1.296 -0.175 -0.022 1.292 E 0.781 -0.104 -2.012 0.776 -0.157 F -0.499 1.339 0.679 0.405 -0.499 d) pathway_genes – A character vector containing the names of all pathway genes (also known as "Genes of Interest" and abbreviated as "GOIs") to be tested. For example: > morph_obj$pathway_genes [1] "AT5G17230" "AT4G14210" "AT1G10830" "AT3G04870" "AT1G06820" "AT3G10230" MORPH – R Implementation – User Manual 7 3.2 Using input files Although the option of applying the algorithm on one's own data exists (as described in section 3.1 – "The MORPH Object"), it requires the user to construct the MORPH object from scratch. An alternative to this approach is using the prepareMorphObject function (described in section 4.2.1 – "prepareMorphObject"). This function reads files containing the data and creates a MORPH object. It also takes care of the data validity by performing the following steps: a) Eliminate duplicate genes from the clustering solution. b) For each clustering solution, eliminate genes that do not appear in gene expression data. c) Detect genes in the gene expression data that do not exist in the clustering solution and set them under a separate cluster named "unclustered". The prepareMorphObject function receives two strings indicating the paths to a "configs.txt" file and a pathway genes file. These files must be in a specific format that we will now describe. 3.2.1 Configs.txt The configs.txt file contains a list of all configurations to be tested in the MORPH run. It is a tab-delimited file with two columns: the first contains the names of the gene expression data files, whereas the second contains the corresponding names of the clustering solutions. For example: ds1Data.txt MatrixIsEnzyme.txt ds1Data.txt ds1_ppi_matisse_0.4.txt ds1Data.txt DS1_click.txt DataSet3Matrix.txt MatrixIsEnzyme.txt DataSet3Matrix.txt ds3_matisse_0.4.txt DataSet3Matrix.txt ds3_ppi_matisse_0.4.txt DataSet3Matrix.txt Seeds_click.txt MORPH – R Implementation – User Manual 8 In other words, each row contains a single configuration ("data pair") to be tested. It is therefore valid for the same gene expression dataset and/or a clustering solution to appear twice (or more) in the file, but not in the same configuration. Make sure that this file is accurate, since the MORPH algorithm follows it during the analysis. 3.2.2 Pathway genes This file contains the names of the pathway genes and may be arranged in one of two formats: a) Tab-delimited – The file has one and only line, which contains the names separated by tabs. For example: AT5G17230 AT4G14210 AT1G10830 AT3G04870 AT1G06820 b) Newline – The gene names are separated by new lines, i.e. each line holds one gene name. For example: AT5G17230 AT4G14210 AT1G10830 AT3G04870 AT1G06820 3.2.3 Gene expression data These are tab-delimited text files in which that rows names are genes names, whereas the column names are the conditions. For example: ID npr1_mock wrky18_mock wt_BTH8h AT1G01010 -0.499 -0.499 2.213 -0.499 0.781 AT1G01040 -0.092 -0.345 -0.151 -1.296 -0.104 AT1G01050 0.019 0.188 -0.175 -0.175 -2.012 AT1G01080 1.229 1.157 -1.007 0.96 -0.731 AT1G01090 1.626 0.637 -0.699 1.099 npr1_BTH8h wrky18_BTH8h -0.203 MORPH – R Implementation – User Manual 9 It would be preferred, but not necessary, to remove any uninformative lines (e.g. containing missing data, >80% zero values, etc.). 3.2.4 Clustering solution This tab-delimited text file contains two columns: the first holds the name of the genes and the other the parent clusters. In other words, each row contains a gene name and its corresponding cluster, separated by tab. For example: AT1G01120 1 AT1G01140 1 AT1G01240 2 AT1G01725 3 AT1G01730 3 One should note that the cluster name (or "module") does not necessarily have to be a number, but may also be named by a string. For example: AT1G01050 Enzyme AT1G01090 Not_Enzyme 4. Description of Functions 4.1. General Functions 4.1.1. getClusteringInformation Description Receives a path to a clustering solution file and extracts the information into a variable. Usage getClusteringInformation(ClusterFile) Arguments ClusterFile Path to a clustering solution file (String) Output MORPH – R Implementation – User Manual 10 A data frame where the row names are the names of the genes and the first and only column contains their corresponding clusters/modules. Example solutionFile = "IsEnzyme.txt" #Set path to clustering solution file clustering_solution = getClusteringInformation(solutionFile) print(unique(clustering_solution[,1])) #Print unique list of clusters 4.1.2. getPathwayGenes Description Receives a path to a pathway genes file (a.k.a Genes of Interest, abbreviated as "GOI") and extracts the information into a variable. Usage getPathwayGenes(InputGOI) Arguments InputGOI Path to pathway genes file (String) Output Character vector containing names of pathway genes (Vector) Example pathwayGenesFile = "Carotenoids.txt" #Set path to pathway genes file pathway = getPathwayGenes(pathwayGenesFile) print (length(pathway)) #Print amount of pathway genes 4.1.3. getGeneExpression Description Receives a path to a gene expression data file and extracts the information into a variable. Usage getGeneExpression(InputGE) Arguments InputGE Path to gene expression data file (String) MORPH – R Implementation – User Manual 11 Output A data frame containing gene expression data (Data frame) Example fileGE = "DataSet3Matrix.txt" #Set path to gene expression data file GE = getGeneExpression(fileGE) print(nrow(GE)) #Print number of rows (genes) in data file 4.2. MORPH Functions 4.2.1. prepareMorphObject Description Receives paths to both a configs.txt file and a file containing all pathway genes, and creates a MORPH object that may serve as an input for the MORPH algorithm. Usage prepareMorphObject(InputConfig,InputGOI = NULL) Arguments InputConfig Path to configs.txt file (String) InputGOI Path to file containing pathway genes (String) Output A MORPH object that may serve as a valid input to the MORPH algorithm. Notes a) The function assumes that all gene expression matrices contain the same gene set. b) For each clustering solution, the function adds all unclustered genes into a separate new cluster. c) The function removes genes that are not found in the gene expression data. MORPH – R Implementation – User Manual 12 Example InputConfig = "Configs.txt" InputGOI = "Carotenoids.txt" morph_input = prepareMorphObject(InputConfig,InputGOI) print(names(morph_input)) #Print the members of the MORPH object (list). [1] "config" "clustering_solutions" "ge_datasets" "pathway_genes" print(morph_input$pathway_genes) #Print the names of the pathway genes. [1] "AT5G17230" "AT4G14210" "AT1G10830" "AT3G04870" "AT1G06820" "AT3G10230" "AT5G57030" "AT4G25700" "AT5G52570" "AT1G31800" "AT3G53130" "AT5G67030" "AT1G08550" 4.2.2. rankGenes Description Receives gene expression data, clustering solution, and the names of the pathway genes, and then ranks the candidate (non-pathway) genes in terms of how plausible it is that a given gene is associated with the target pathway. The function will go through all clusters and will calculate the normalized correlation scores for each one. Usage (G,C,GE,corrs_mat = NULL) Arguments Mandatory G Names of all GOIs (Vector) C Clustering solution (Data frame) GE Gene-expression data for this clustering-solution (Data frame) Optional MORPH – R Implementation – User Manual 13 corrs_mat A co-expression (covariance) matrix of all gene combinations (Matrix) Output A List of ordered vectors (decreasing values) containing normalized correlation scores of all clusters (List) Example Pathway = morph_input$pathway_genes C = getClusteringInformation("MatrixIsEnzyme.txt") GE = getGeneExpression("ds1Data.txt") Ranking = rankGenes(Pathway,C,GE,corrs_mat) #Rank candidates vs. pathway-genes print(names(Ranking)) [1] "Scored" "Rejected" print(head(Ranking$Scored)) #Print all validly calculated scores AT5G50250 AT5G54290 AT5G26760 AT5G42270 AT3G17930 AT3G08010 4.025495 3.725712 3.700301 3.569788 3.550832 3.473446 print(head(Ranking$Rejected)) #Print all genes that were not clustered with any pathway gene (Score = NA) AT1G72940 AT1G72900 AT1G33590 AT2G41830 AT5G10290 AT1G67720 NA NA NA NA NA NA 4.2.3. getNormalizedCorrelations Description Receives (for a specific cluster) the names of both candidates and pathway genes, along with a co-expression (covariance) matrix of all gene combinations and executes the following steps: a) Calculation of Pearson correlation coefficients for each candidate vs. all pathway genes. b) Calculation of the average for all correlation coefficients for each candidate. c) Calculation of the mean and standard-deviation of all the averages from step b. d) Normalization of the correlation scores: MORPH – R Implementation – User Manual 14 Usage getNormalizedCorrelations(corrs_mat, CurrentG, Candidates) Arguments Optional corrs_mat A co-expression (covariance) matrix of all gene pairs (Matrix) Mandatory Candidates Names of candidates (Vector) CurrentG Names of pathway genes (Vector) Output Ordered vector (descending values) of normalized correlation scores (Vector) Example cluster = 2 #Currently analyzed cluster (arbitrary example) clusterGenes = names(clustering_solution[which(clustering_solution==cluster)]) #Extracting all genes which belong to this cluster clusterPathway = intersect(pathway,clusterGenes) #Obtaining pathway genes in current cluster clusterCandidates = setdiff(clusterGenes,clusterPathway) #Obtain candidate genes (i.e. all cluster genes that are non-pathway) NormalizedCorrelations = getNormalizedCorrelations(corrs_mat, clusterPathway, clusterCandidates) print(head(NormalizedCorrelations)) AT5G50250 AT5G54290 AT5G26760 AT5G42270 AT3G17930 AT3G08010 4.025495 3.725712 3.700301 3.569788 3.550832 3.473446 MORPH – R Implementation – User Manual 15 4.2.4. internal_LOOCV Description Receives the names of the pathway genes, expression data and a clustering solution. The function then repeatedly removes one gene from the target pathway (denoted as V), generates the ranking based on the remaining genes (the training set), and calculates the rank of the test gene, denoted as the self-rank of that gene. Usage internal_LOOCV(G, C, GE, K, corrs_mat=NULL, NameGE=NULL, NameC=NULL) Arguments Required G Names of all pathway genes (Vector). C Clustering solution (Data frame). GE Gene expression data (Data frame). Optional corrs_mat NameGE NameC Co-expression matrix of all gene combinations (Matrix). Name of gene expression data file (for plotting purposes). Name of clustering solution file (for plotting purposes). Output A score in the range of 0 to 1 representing the area under the curve of the selfranked genes, by utilizing the AUSR function described in section 4.2.5 – "AUSR" (Numeric). Example G = morph_input$pathway_genes #Get names of pathway genes NameC = names(morph_input$clustering_solutions)[1] #Get name of first clustering solution ("MatrixIsEnzyme.txt") MORPH – R Implementation – User Manual 16 C = (morph_input$clustering_solutions)[[NameC]] #Get first clustering solution NameGE = names(morph_input$ge_datasets)[1] #Get name of first gene expression dataset ("ds1Data.txt") GE = (morph_input$ge_datasets)[[NameGE]] #Get first gene expression dataset K = 1000 #Set self-rank threshold AUC = internal_LOOCV(G, C, GE, K, NULL, NameGE, NameC) print(AUC) [1] 0.6398462 Note When providing both the NameGE and NameC parameters to the function, it will also plot the self-rank graph to the console. In the above example the following plot is created: 4.2.5. AUSR Description Receives a vector of self-ranks and a defined threshold, and then calculates for every self-rank threshold in some predefined interval (1 to K by 1) the fraction of pathway genes that were detected at the threshold when acting as test genes. The function will then return a score in the range of 0 to 1 representing the area under the curve of the self-ranked genes (AUSR). MORPH – R Implementation – User Manual 17 Usage AUSR(SelfRanks, K, NameGE = NULL, NameC = NULL) Arguments Mandatory SelfRanks Self-ranks, obtained by internal_LOOCV procedure (Vector). K Threshold / range (Integer). Optional NameGE Name of gene expression data file (for plotting purposes). NameC Name of clustering solution file (for plotting purposes). Output Score in the range 0 to 1 representing the area under the curve of the self-ranked genes; AUSR (Numeric). Example SelfRanks = c(107,65,2000,6,1175,43,52,49,4861,76,95,198,1054) #Vector of self-ranks K = 1000 #Set self-rank threshold AUC = AUSR(SelfRanks,K,NULL,NULL) #Run function & refrain from plotting print(AUC) [1] 0.6398462 4.2.6. MORPH Description Receives a MORPH object (see section 4.2.1 – "prepareMorphObject") and a threshold (for internal_LOOCV purposes), and runs the MORPH algorithm on it. Usage MORPH – R Implementation – User Manual 18 MORPH (morph_obj, K=1000, view = FALSE) Arguments Mandatory morph_obj A valid MORPH object containing all necessary information (MORPH Object). Threshold for internal_LOOCV purposes (Integer) K Optional view If set to TRUE, a file named "Threshold_Plots.pdf" will be created in the working directory and will contain internal_LOOCV plots for all processed configurations. Output MORPH Results object (morph_results) Notes MORPH object is a list-type variable containing sub-lists (one for each configuration) which holds: Output[[i]]$C - Name of clustering solution file of configuration. Output[[i]]$GE - Name of gene expression data file of configuration. Output[[i]]$AUSR - AUSR score of configuration. Output[[i]]$Ranking - Ranking scores of configuration, further divided into - o (Output[[i]]$Ranking)$Scored - All validly calculated scores. o (Output[[i]]$Ranking)$Rejected - All genes that were not clustered with any pathway gene (Score = NA). Example InputConfig = "Configs.txt" InputGOI = "CarotenoidsHeader.txt" morph_input = prepareMorphObject(InputConfig,InputGOI) Scores = MORPH(morph_input, view = TRUE) MORPH – R Implementation – User Manual 19 4.2.7. removeAbsentPathwayGenes Description Receives names of pathway genes, names of genes in gene expression data, and a clustering solution. The function will then return a list of the pathway genes that appear in all three categories. Usage removeAbsentPathwayGenes(G,C,GENames) Arguments G Names of pathway genes. C Clustering solution. GENames Names of genes in gene expression data. Output All pathway genes that appear both in the clustering solution and the gene expression data (Vector). Example G = morph_input$pathway_genes #Get pathway genes C = (morph_input$clustering_solution)[[1]] #Get clustering solution GE = (morph_input$ge_datasets)[[1]] #Get gene expression dataset GENames = rownames(GE) #Get names of genes in dataset Intersection = removeAbscentPathwayGenes(G,C,GENames) print(Intersection) [1] "AT5G17230" "AT4G14210" "AT1G10830" "AT3G04870" "AT1G06820" "AT5G57030" "AT4G25700" "AT5G52570" "AT1G31800" "AT3G53130" "AT5G67030" "AT3G10230" [13] "AT1G08550" MORPH – R Implementation – User Manual 20 4.2.8. global_LOOCV Description Receives a MORPH object (see the section 4.2.1 – "prepareMorphObject") and a threshold, and runs a LOOCV procedure for the MORPH algorithm. Usage global_LOOCV (morph_obj,K=1000) Arguments morph_obj A valid MORPH object containing all necessary information (MORPH Object). K Threshold for LOOCV purposes (Integer) Output AUSR Score (in the range 0 to 1) for the MORPH algorithm (Numeric) Example InputConfig = "Configs.txt" InputGOI = "CarotenoidsHeader.txt" morph_input = prepareMorphObject(InputConfig,InputGOI) LOOCV = global_LOOCV(morph_input) print(LOOCV) [1] 0.846 4.2.9. getMorphResultBestConfig Description Receives a MORPH Results object and returns the details for the best configuration. Usage MORPH – R Implementation – User Manual 21 getMorphResultBestConfig(morph_res_obj) Arguments morph_res_obj MORPH Results object Output List containing details for best configuration. Notes Output is arranged as follows: Output$AUSR - AUSR score (between 0 and 1) Output$C - Name of clustering solution file of configuration. Output$GE - Name of gene expression data file of configuration. Output$Ranking - Ranking scores of configuration, further divided into - o (Output$Ranking)$Scored - All validly calculated scores. o (Output$Ranking)$Rejected - All genes that were not clustered with any pathway gene (Score = NA). Example Scores = MORPH(morph_input, view = FALSE) BestConfig = getMorphResultBestConfig(Scores) print(names(BestConfig)) [1] "AUSR" "Ranking" "C" "GE" print(BestConfig$AUSR) [1] 0.9163077 4.2.10. getMorphPredictions Description Receives a MORPH Results object and returns the corresponding predictions, i.e. the normalized correlation scores of the best configuration. Usage MORPH – R Implementation – User Manual 22 getMorphPredictions(morph_res_obj) Arguments morph_res_obj MORPH Results object Output Ranking scores of configuration (Vector) Example Scores = MORPH(morph_input, view = FALSE) Predictions = getMorphPredictions(Scores) print(head(Predictions)) AT4G37760 AT3G63520 AT4G32770 AT1G17050 AT2G26800 AT2G41680 2.600600 2.567530 2.464489 2.369235 2.366306 2.352861 4.2.11. getScoresDistributionPlots Description Receives a MORPH Results object and a path to an output PDF file, and plots the ranking scores distribution to the output file. Usage getScoresDistributionPlots(Scores, OutputFile="ScoresDistribution.pdf", Color="blue", Type="b") Arguments Mandatory Scores MORPH Results object OutputFile Path to output PDF file (String) Optional Color R plot color (String) MORPH – R Implementation – User Manual 23 Type R plot type (String) Output Ranking scores of configuration (Vector) Example Scores = MORPH(morph_input, view = FALSE) getScoresDistributionPlots(Scores) [1] "Distribution plot file created at: ScoresDistribution.pdf" Notes The resulting file will hold the score distribution plots for all configurations. For example above, this is the distribution plot for the first configuration: MORPH – R Implementation – User Manual 24 MORPH – R Implementation – User Manual 25