ppl12326-sup-0001-AppendixS1

advertisement
Supplementary Text - MORPH-R documentation
Ranking Candidate Genes for Membership in Pathways
Version
XXX
Date
XXX
Authors
Oren Tzfadia <Oren.Tzfadia@weizmann.ac.il>
David Amar <Ddam.Am@gmail.com>
Itziar Frades < Itziar.Frades@gmail.com>
Erik Alexandersson < Erik.Alexandersson@slu.se >
Ron Shamir <RShamir@tau.ac.il>
Maintainers
Dimitry Chazanov<Dimitry.Chazanov@weizmann.ac.il>
Nati Ghatan <Nati.Ghatan@gmail.com>
Zip data
Yes
License
<LGPL>
Description
Functions designed to reveal unknown genes in biological pathways.
URL
http://biocourse.weizmann.ac.il/morph/
MORPH – R Implementation – User Manual
1
Table of Contents
1.
Introduction
3
2. MORPH-R package content
4
3. Required Inputs & Valid Formats
4-9
3.1. The MORPH object
4-6
3.2. Using input files
6-9
3.2.1. Configs.txt
7-8
3.2.2. Pathway genes
8
3.2.3. Gene expression data
8-9
3.2.4. Clustering solution
9
4. Description of Functions
9-23
4.1. General Functions
9-11
4.1.1.
getClusteringInformation
9-10
4.1.2.
getPathwayGenes
10
4.1.3.
getGeneExpression
10-11
4.2. MORPH Functions
11-23
4.2.1.
prepareMorphObject
11-12
4.2.2.
rankGenes
12-13
4.2.3.
getNormalizedCorrelations
13-14
4.2.4.
internal_LOOCV
15-16
4.2.5.
AUSR
16-17
4.2.6. MORPH
17-18
4.2.7.
19
removeAbscentPathwayGenes
4.2.8. global_LOOCV
20
4.2.9. getMorphResultBestConfig
20-21
4.2.10. getMorphPredictions
21-22
4.2.11. getScoresDistributionPlots
22-23
MORPH – R Implementation – User Manual
2
1. The MORPH Algorithm - Introduction
This package implements an algorithm called MORPH (for module-guided ranking of
candidate pathway genes) for revealing unknown genes in biological pathways. The
algorithm is explained in greater detail in the article[1] upon which this implementation is
based. Briefly, this method receives as input a set of known genes from the target
pathway, a collection of expression profiles, clustering solutions, and interaction and
metabolic networks. Using machine learning techniques, MORPH selects the best
combination of data and analysis method and outputs a ranking of candidate genes
predicted to belong to the target pathway.
The MORPH-R package presented here comprises the entire scripts and functions (in the
form of R code) that are required in order to run the MORPH process. A user who wishes
to apply the MORPH algorithm on her own data, is required first to construct the MORPH
object from scratch (as described in section 4.2.1 -"prepare The MORPH Object")
Reference
1.
Tzfadia O, Amar D, Bradbury LM, Wurtzel ET and Shamir R. (2012). The MORPH
algorithm: ranking candidate genes for membership in Arabidopsis and tomato
pathways. Plant Cell. 11:4389-406. doi: 10.1105.
MORPH – R Implementation – User Manual
3
2. MORPH-R package content
The MORPH package comprises several files. First and foremost, it contains the MORPH.R
file which contains all functions that are described in section 4 – "Description of
Function". These functions are required in order to run the MORPH process. Therefore,
any code that uses these functions will have to use this file as a source (for more
information, type source in the R console).
In addition, this package contains four datasets compressed in ZIP format: One for
Arabidopsis thaliana (Data_AT.zip), one for Solanum lycopersicum (Data_SL.zip), and two
for Solanum tuberosum (Data_StITAG.zip and Data_StPGSC.zip). The first two are the
same datasets which were used in the reference article and the latter two are new
additional data sets we added for the benefit of the potato research community. All the
files within these folders are arranged in the valid input format for the algorithm, as
described in section 3 - "Required Inputs & Valid Formats".
Table S1. Default data sets included in current version of the MORPH-R pack
Species
Gene IDs
Arabidopsis*
Tomato*
Potato ITAG**
Potato PGSC**
Example ID:
AT5G17250
Solyc02g04650
Sotub01g031170
PGSC0003DMP400000015
Gene Expression
Profiles
(# data sets)
213 (4)
53 (4)
343 (4)
343(4)
Clustering
Solutions
8
6
6
6
* See Supplementary Table 1 in Tzfadia et al, 2012 for GE list of accession numbers
** See Supplementary Table X for potato GE list of accession numbers
3. Required Inputs & Valid Formats
3.1 The MORPH Object
Originally, when the MORPH algorithm was published in 2012, it was designed to receive
known pathway genes from the user and then rank them as potential members of this
pathway in one of two organisms whose gene expression data and clustering solutions
were available for use (either A.thaliana or S.lycopersicum). This initial setting
notwithstanding, one may also use the functions included in this package by embedding
MORPH – R Implementation – User Manual
4
them in one's own code and then apply them on one's own data, in virtually any model
organism.
In either case, the input to the MORPH must be organized in a certain structure which if
referred to as "MORPH object". In R terms, the MORPH object belongs to the class
"morph_data" and is essentially a list that comprises several different data types.
First, let us examine the object declaration in R syntax:
> morph_obj = list(config, clustering_solutions, ge_datasets, pathway_genes)
> class(morph_obj) = "morph_data"
Now let's examine the members of the list object:
a) config – A data frame containing the configuration on which the algorithm will
run. The first column holds the names of the gene expression datasets. The
second column holds the names of their corresponding clustering solutions. The
columns may contain duplicates, since it is possible to run the same gene
expression dataset against two (or more) different clustering solutions. For
example:
V1
V2
1
DataSet1
ClusteringSolution1
2
DataSet1
ClusteringSolution2
3
DataSet1
ClusteringSolution3
4
DataSet2
ClusteringSolution1
5
DataSet3
ClusteringSolution4
b) clustering_solutions – This member is a list that contains all clustering solutions.
The name of the each clustering solution should correspond to the appropriate
name in the config field of the MORPH object. For example:
> names(morph_obj$clustering_solutions)
[1] "ClusteringSolution1" "ClusteringSolution2" "ClusteringSolution3"
[4] "ClusteringSolution4"
Each clustering solution is a data frame that contains the partition of genes into
their parent clusters (or "modules") in a sort of hash structure. The row names
MORPH – R Implementation – User Manual
5
are the names of the genes while the first and only column contains their parent
cluster (which may be a number or a name). For example:
MORPH – R Implementation – User Manual
6
ClusteringSolution3
AT1G01050
1
AT1G01090
1
AT1G01120
2
AT1G01140
2
Important notes:


The clustering solution must not contain any genes that do not appear in
the gene expression dataset.
Every gene in the gene expression data must have a corresponding
cluster.
c) ge_datasets – This member is a list that contains all gene expression datasets. Its
structure somewhat resembles that of the clustering_solutions member since the
names of the members should also correspond to the appropriate names in the
config field of the MORPH object. For example:
> names(morph_obj$ge_datasets)
[1] "DataSet1"
"DataSet2"
"DataSet3"
Each gene expression dataset is a data frame, where the row names are the
names of the genes, the column names are the names of the conditions (e.g. time
points, stress conditions, tissues, etc.), and the cells contain the expression level
in a linear scale, i.e. non-log (the example data available in this package are a
linear expression scale from microarray experiments). For example:
ID
AT1G01010
AT1G01040
AT1G01050
AT1G01060
AT1G01070
A
-0.499
-0.092
0.019
1.546
-0.932
B
-0.499
-0.345
0.188
-0.071
1.581
C
2.213
-0.151
-0.175
-0.12
0.651
D
-0.499
-1.296
-0.175
-0.022
1.292
E
0.781
-0.104
-2.012
0.776
-0.157
F
-0.499
1.339
0.679
0.405
-0.499
d) pathway_genes – A character vector containing the names of all pathway genes
(also known as "Genes of Interest" and abbreviated as "GOIs") to be tested. For
example:
> morph_obj$pathway_genes
[1] "AT5G17230" "AT4G14210" "AT1G10830" "AT3G04870" "AT1G06820" "AT3G10230"
MORPH – R Implementation – User Manual
7
3.2 Using input files
Although the option of applying the algorithm on one's own data exists (as described in
section 3.1 – "The MORPH Object"), it requires the user to construct the MORPH object
from scratch. An alternative to this approach is using the prepareMorphObject function
(described in section 4.2.1 – "prepareMorphObject"). This function reads files containing
the data and creates a MORPH object. It also takes care of the data validity by performing
the following steps:
a) Eliminate duplicate genes from the clustering solution.
b) For each clustering solution, eliminate genes that do not appear in gene
expression data.
c) Detect genes in the gene expression data that do not exist in the clustering
solution and set them under a separate cluster named "unclustered".
The prepareMorphObject function receives two strings indicating the paths to a
"configs.txt" file and a pathway genes file. These files must be in a specific format that
we will now describe.
3.2.1 Configs.txt
The configs.txt file contains a list of all configurations to be tested in the MORPH run. It is
a tab-delimited file with two columns:
the first contains the names of the gene
expression data files, whereas the second contains the corresponding names of the
clustering solutions. For example:
ds1Data.txt
MatrixIsEnzyme.txt
ds1Data.txt
ds1_ppi_matisse_0.4.txt
ds1Data.txt
DS1_click.txt
DataSet3Matrix.txt
MatrixIsEnzyme.txt
DataSet3Matrix.txt
ds3_matisse_0.4.txt
DataSet3Matrix.txt
ds3_ppi_matisse_0.4.txt
DataSet3Matrix.txt
Seeds_click.txt
MORPH – R Implementation – User Manual
8
In other words, each row contains a single configuration ("data pair") to be tested. It is
therefore valid for the same gene expression dataset and/or a clustering solution to
appear twice (or more) in the file, but not in the same configuration. Make sure that this
file is accurate, since the MORPH algorithm follows it during the analysis.
3.2.2 Pathway genes
This file contains the names of the pathway genes and may be arranged in one of two
formats:
a) Tab-delimited – The file has one and only line, which contains the names
separated by tabs. For example:
AT5G17230
AT4G14210
AT1G10830
AT3G04870
AT1G06820
b) Newline – The gene names are separated by new lines, i.e. each line holds one
gene name. For example:
AT5G17230
AT4G14210
AT1G10830
AT3G04870
AT1G06820
3.2.3 Gene expression data
These are tab-delimited text files in which that rows names are genes names, whereas
the column names are the conditions. For example:
ID
npr1_mock
wrky18_mock
wt_BTH8h
AT1G01010
-0.499 -0.499 2.213
-0.499 0.781
AT1G01040
-0.092 -0.345
-0.151
-1.296
-0.104
AT1G01050
0.019
0.188
-0.175
-0.175
-2.012
AT1G01080
1.229
1.157
-1.007
0.96
-0.731
AT1G01090
1.626
0.637
-0.699 1.099
npr1_BTH8h
wrky18_BTH8h
-0.203
MORPH – R Implementation – User Manual
9
It would be preferred, but not necessary, to remove any uninformative lines (e.g.
containing missing data, >80% zero values, etc.).
3.2.4 Clustering solution
This tab-delimited text file contains two columns: the first holds the name of the genes
and the other the parent clusters. In other words, each row contains a gene name and its
corresponding cluster, separated by tab. For example:
AT1G01120
1
AT1G01140
1
AT1G01240
2
AT1G01725
3
AT1G01730
3
One should note that the cluster name (or "module") does not necessarily have to be a
number, but may also be named by a string. For example:
AT1G01050
Enzyme
AT1G01090
Not_Enzyme
4. Description of Functions
4.1. General Functions
4.1.1. getClusteringInformation
Description
Receives a path to a clustering solution file and extracts the information into a
variable.
Usage
getClusteringInformation(ClusterFile)
Arguments
ClusterFile
Path to a clustering solution file (String)
Output
MORPH – R Implementation – User Manual
10
A data frame where the row names are the names of the genes and the first and
only column contains their corresponding clusters/modules.
Example
solutionFile = "IsEnzyme.txt" #Set path to clustering solution file
clustering_solution = getClusteringInformation(solutionFile)
print(unique(clustering_solution[,1])) #Print unique list of clusters
4.1.2. getPathwayGenes
Description
Receives a path to a pathway genes file (a.k.a Genes of Interest, abbreviated as
"GOI") and extracts the information into a variable.
Usage
getPathwayGenes(InputGOI)
Arguments
InputGOI
Path to pathway genes file (String)
Output
Character vector containing names of pathway genes (Vector)
Example
pathwayGenesFile = "Carotenoids.txt" #Set path to pathway genes file
pathway = getPathwayGenes(pathwayGenesFile)
print (length(pathway)) #Print amount of pathway genes
4.1.3. getGeneExpression
Description
Receives a path to a gene expression data file and extracts the information into a
variable.
Usage
getGeneExpression(InputGE)
Arguments
InputGE Path to gene expression data file (String)
MORPH – R Implementation – User Manual
11
Output
A data frame containing gene expression data (Data frame)
Example
fileGE = "DataSet3Matrix.txt" #Set path to gene expression data file
GE = getGeneExpression(fileGE)
print(nrow(GE)) #Print number of rows (genes) in data file
4.2. MORPH Functions
4.2.1. prepareMorphObject
Description
Receives paths to both a configs.txt file and a file containing all pathway genes,
and creates a MORPH object that may serve as an input for the MORPH
algorithm.
Usage
prepareMorphObject(InputConfig,InputGOI = NULL)
Arguments
InputConfig
Path to configs.txt file (String)
InputGOI
Path to file containing pathway genes (String)
Output
A MORPH object that may serve as a valid input to the MORPH algorithm.
Notes
a) The function assumes that all gene expression matrices contain the same gene
set.
b) For each clustering solution, the function adds all unclustered genes into a
separate new cluster.
c) The function removes genes that are not found in the gene expression data.
MORPH – R Implementation – User Manual
12
Example
InputConfig = "Configs.txt"
InputGOI = "Carotenoids.txt"
morph_input = prepareMorphObject(InputConfig,InputGOI)
print(names(morph_input)) #Print the members of the MORPH object (list).
[1] "config" "clustering_solutions" "ge_datasets" "pathway_genes"
print(morph_input$pathway_genes) #Print the names of the pathway genes.
[1] "AT5G17230" "AT4G14210" "AT1G10830" "AT3G04870" "AT1G06820" "AT3G10230"
"AT5G57030" "AT4G25700" "AT5G52570" "AT1G31800" "AT3G53130" "AT5G67030" "AT1G08550"
4.2.2. rankGenes
Description
Receives gene expression data, clustering solution, and the names of the
pathway genes, and then ranks the candidate (non-pathway) genes in terms of
how plausible it is that a given gene is associated with the target pathway. The
function will go through all clusters and will calculate the normalized correlation
scores for each one.
Usage
(G,C,GE,corrs_mat = NULL)
Arguments
Mandatory
G
Names of all GOIs (Vector)
C
Clustering solution (Data frame)
GE
Gene-expression data for this clustering-solution (Data frame)
Optional
MORPH – R Implementation – User Manual
13
corrs_mat
A co-expression (covariance) matrix of all gene combinations
(Matrix)
Output
A List of ordered vectors (decreasing values) containing normalized correlation
scores of all clusters (List)
Example
Pathway = morph_input$pathway_genes
C = getClusteringInformation("MatrixIsEnzyme.txt")
GE = getGeneExpression("ds1Data.txt")
Ranking = rankGenes(Pathway,C,GE,corrs_mat) #Rank candidates vs. pathway-genes
print(names(Ranking))
[1] "Scored" "Rejected"
print(head(Ranking$Scored)) #Print all validly calculated scores
AT5G50250 AT5G54290 AT5G26760 AT5G42270 AT3G17930 AT3G08010
4.025495 3.725712 3.700301 3.569788 3.550832 3.473446
print(head(Ranking$Rejected)) #Print all genes that were not clustered with any pathway gene
(Score = NA)
AT1G72940 AT1G72900 AT1G33590 AT2G41830 AT5G10290 AT1G67720
NA
NA
NA
NA
NA
NA
4.2.3. getNormalizedCorrelations
Description
Receives (for a specific cluster) the names of both candidates and pathway
genes, along with a co-expression (covariance) matrix of all gene combinations
and executes the following steps:
a) Calculation of Pearson correlation coefficients for each candidate vs. all pathway
genes.
b) Calculation of the average for all correlation coefficients for each candidate.
c) Calculation of the mean and standard-deviation of all the averages from step b.
d) Normalization of the correlation scores:
MORPH – R Implementation – User Manual
14
Usage
getNormalizedCorrelations(corrs_mat, CurrentG, Candidates)
Arguments
Optional
corrs_mat
A co-expression (covariance) matrix of all gene pairs (Matrix)
Mandatory
Candidates
Names of candidates (Vector)
CurrentG
Names of pathway genes (Vector)
Output
Ordered vector (descending values) of normalized correlation scores (Vector)
Example
cluster = 2 #Currently analyzed cluster (arbitrary example)
clusterGenes = names(clustering_solution[which(clustering_solution==cluster)]) #Extracting all
genes which belong to this cluster
clusterPathway = intersect(pathway,clusterGenes) #Obtaining pathway genes in current cluster
clusterCandidates = setdiff(clusterGenes,clusterPathway) #Obtain candidate genes (i.e. all cluster
genes that are non-pathway)
NormalizedCorrelations = getNormalizedCorrelations(corrs_mat, clusterPathway,
clusterCandidates)
print(head(NormalizedCorrelations))
AT5G50250 AT5G54290 AT5G26760 AT5G42270 AT3G17930 AT3G08010
4.025495 3.725712 3.700301 3.569788 3.550832 3.473446
MORPH – R Implementation – User Manual
15
4.2.4. internal_LOOCV
Description
Receives the names of the pathway genes, expression data and a clustering solution. The
function then repeatedly removes one gene from the target pathway (denoted as V),
generates the ranking based on the remaining genes (the training set), and calculates the
rank of the test gene, denoted as the self-rank of that gene.
Usage
internal_LOOCV(G, C, GE, K, corrs_mat=NULL, NameGE=NULL, NameC=NULL)
Arguments
Required
G
Names of all pathway genes (Vector).
C
Clustering solution (Data frame).
GE
Gene expression data (Data frame).
Optional
corrs_mat
NameGE
NameC
Co-expression matrix of all gene combinations (Matrix).
Name of gene expression data file (for plotting purposes).
Name of clustering solution file (for plotting purposes).
Output
A score in the range of 0 to 1 representing the area under the curve of the selfranked genes, by utilizing the AUSR function described in section 4.2.5 – "AUSR"
(Numeric).
Example
G = morph_input$pathway_genes #Get names of pathway genes
NameC = names(morph_input$clustering_solutions)[1] #Get name of first clustering solution
("MatrixIsEnzyme.txt")
MORPH – R Implementation – User Manual
16
C = (morph_input$clustering_solutions)[[NameC]] #Get first clustering solution
NameGE = names(morph_input$ge_datasets)[1] #Get name of first gene expression dataset
("ds1Data.txt")
GE = (morph_input$ge_datasets)[[NameGE]] #Get first gene expression dataset
K = 1000 #Set self-rank threshold
AUC = internal_LOOCV(G, C, GE, K, NULL, NameGE, NameC)
print(AUC)
[1] 0.6398462
Note
When providing both the NameGE and NameC parameters to the function, it will also plot
the self-rank graph to the console. In the above example the following plot is created:
4.2.5. AUSR
Description
Receives a vector of self-ranks and a defined threshold, and then calculates for
every self-rank threshold in some predefined interval (1 to K by 1) the fraction of
pathway genes that were detected at the threshold when acting as test genes.
The function will then return a score in the range of 0 to 1 representing the area
under the curve of the self-ranked genes (AUSR).
MORPH – R Implementation – User Manual
17
Usage
AUSR(SelfRanks, K, NameGE = NULL, NameC = NULL)
Arguments
Mandatory
SelfRanks
Self-ranks, obtained by internal_LOOCV procedure (Vector).
K
Threshold / range (Integer).
Optional
NameGE
Name of gene expression data file (for plotting purposes).
NameC
Name of clustering solution file (for plotting purposes).
Output
Score in the range 0 to 1 representing the area under the curve of the self-ranked
genes; AUSR (Numeric).
Example
SelfRanks = c(107,65,2000,6,1175,43,52,49,4861,76,95,198,1054) #Vector of self-ranks
K = 1000 #Set self-rank threshold
AUC = AUSR(SelfRanks,K,NULL,NULL) #Run function & refrain from plotting
print(AUC)
[1] 0.6398462
4.2.6. MORPH
Description
Receives a MORPH object (see section 4.2.1 – "prepareMorphObject") and a
threshold (for internal_LOOCV purposes), and runs the MORPH algorithm on it.
Usage
MORPH – R Implementation – User Manual
18
MORPH (morph_obj, K=1000, view = FALSE)
Arguments
Mandatory
morph_obj
A valid MORPH object containing all necessary information
(MORPH Object).
Threshold for internal_LOOCV purposes (Integer)
K
Optional
view
If set to TRUE, a file named "Threshold_Plots.pdf" will be created
in the working directory and will contain internal_LOOCV plots for
all processed configurations.
Output
MORPH Results object (morph_results)
Notes
MORPH object is a list-type variable containing sub-lists (one for each
configuration) which holds:

Output[[i]]$C - Name of clustering solution file of configuration.

Output[[i]]$GE - Name of gene expression data file of configuration.

Output[[i]]$AUSR - AUSR score of configuration.

Output[[i]]$Ranking - Ranking scores of configuration, further divided into -
o
(Output[[i]]$Ranking)$Scored - All validly calculated scores.
o
(Output[[i]]$Ranking)$Rejected - All genes that were not clustered with any
pathway gene (Score = NA).
Example
InputConfig = "Configs.txt"
InputGOI = "CarotenoidsHeader.txt"
morph_input = prepareMorphObject(InputConfig,InputGOI)
Scores = MORPH(morph_input, view = TRUE)
MORPH – R Implementation – User Manual
19
4.2.7. removeAbsentPathwayGenes
Description
Receives names of pathway genes, names of genes in gene expression data, and
a clustering solution. The function will then return a list of the pathway genes
that appear in all three categories.
Usage
removeAbsentPathwayGenes(G,C,GENames)
Arguments
G
Names of pathway genes.
C
Clustering solution.
GENames
Names of genes in gene expression data.
Output
All pathway genes that appear both in the clustering solution and the gene
expression data (Vector).
Example
G = morph_input$pathway_genes #Get pathway genes
C = (morph_input$clustering_solution)[[1]] #Get clustering solution
GE = (morph_input$ge_datasets)[[1]] #Get gene expression dataset
GENames = rownames(GE) #Get names of genes in dataset
Intersection = removeAbscentPathwayGenes(G,C,GENames)
print(Intersection)
[1] "AT5G17230" "AT4G14210" "AT1G10830" "AT3G04870" "AT1G06820"
"AT5G57030" "AT4G25700" "AT5G52570" "AT1G31800" "AT3G53130" "AT5G67030"
"AT3G10230"
[13] "AT1G08550"
MORPH – R Implementation – User Manual
20
4.2.8. global_LOOCV
Description
Receives a MORPH object (see the section 4.2.1 – "prepareMorphObject") and a
threshold, and runs a LOOCV procedure for the MORPH algorithm.
Usage
global_LOOCV (morph_obj,K=1000)
Arguments
morph_obj
A valid MORPH object containing all necessary information
(MORPH Object).
K
Threshold for LOOCV purposes (Integer)
Output
AUSR Score (in the range 0 to 1) for the MORPH algorithm (Numeric)
Example
InputConfig = "Configs.txt"
InputGOI = "CarotenoidsHeader.txt"
morph_input = prepareMorphObject(InputConfig,InputGOI)
LOOCV = global_LOOCV(morph_input)
print(LOOCV)
[1] 0.846
4.2.9. getMorphResultBestConfig
Description
Receives a MORPH Results object and returns the details for the best
configuration.
Usage
MORPH – R Implementation – User Manual
21
getMorphResultBestConfig(morph_res_obj)
Arguments
morph_res_obj MORPH Results object
Output
List containing details for best configuration.
Notes
Output is arranged as follows:

Output$AUSR - AUSR score (between 0 and 1)

Output$C - Name of clustering solution file of configuration.

Output$GE - Name of gene expression data file of configuration.

Output$Ranking - Ranking scores of configuration, further divided into -
o
(Output$Ranking)$Scored - All validly calculated scores.
o
(Output$Ranking)$Rejected - All genes that were not clustered with any
pathway gene (Score = NA).
Example
Scores = MORPH(morph_input, view = FALSE)
BestConfig = getMorphResultBestConfig(Scores)
print(names(BestConfig))
[1] "AUSR" "Ranking" "C"
"GE"
print(BestConfig$AUSR)
[1] 0.9163077
4.2.10. getMorphPredictions
Description
Receives a MORPH Results object and returns the corresponding predictions, i.e.
the normalized correlation scores of the best configuration.
Usage
MORPH – R Implementation – User Manual
22
getMorphPredictions(morph_res_obj)
Arguments
morph_res_obj MORPH Results object
Output
Ranking scores of configuration (Vector)
Example
Scores = MORPH(morph_input, view = FALSE)
Predictions = getMorphPredictions(Scores)
print(head(Predictions))
AT4G37760 AT3G63520 AT4G32770 AT1G17050 AT2G26800 AT2G41680
2.600600 2.567530 2.464489 2.369235 2.366306 2.352861
4.2.11. getScoresDistributionPlots
Description
Receives a MORPH Results object and a path to an output PDF file, and plots the
ranking scores distribution to the output file.
Usage
getScoresDistributionPlots(Scores,
OutputFile="ScoresDistribution.pdf",
Color="blue", Type="b")
Arguments
Mandatory
Scores
MORPH Results object
OutputFile
Path to output PDF file (String)
Optional
Color
R plot color (String)
MORPH – R Implementation – User Manual
23
Type
R plot type (String)
Output
Ranking scores of configuration (Vector)
Example
Scores = MORPH(morph_input, view = FALSE)
getScoresDistributionPlots(Scores)
[1] "Distribution plot file created at: ScoresDistribution.pdf"
Notes
The resulting file will hold the score distribution plots for all configurations. For example
above, this is the distribution plot for the first configuration:
MORPH – R Implementation – User Manual
24
MORPH – R Implementation – User Manual
25
Download