OmicsViz: Tutorial and usage

advertisement
OmicsViz: Tutorial and usage
1. Introduction
OmicsViz is a Cytoscape plugin (cytoscape2.4 and 2.5 and 2.6) dedicated to providing useful
visualization and an integrated analysis tool for large-scale omics data. OmicsViz imports omics
data into Cytoscape and visualizes it on a graph according to the change of gene experimental
values (Figure 1). OmicsViz also provides a mapping function between two different species or
between probe set and experimental names and node names in a network. For example, when
you load an Arabidopsis metabolic pathway in Cytoscape and you have a Grape gene expression
file, OmicsViz can associate the file with the Arabidopsis pathway based on gene mapping file
which contains orthologous genes between the two species.
Figure 1 Overview of the OmicsViz tool windows.
1.1 Structure of this Document
Section 1. Information on getting started with the plugin and tutorial.
Section 2. Tutorial for loading expression data from the same species and across species.
Section 3. Tutorial on loading and view metabolomics data. (not complete)
Section 4. Information on data file formats for loading your own data with the OmicsViz plugin.
1
1.2 Loading the OmicsViz Plugin
Before starting the tutorial, please make sure that you have downloaded and installed a current version of
Cytoscape (2.4 or higher) on your computer. Next, download the OmicsViz plugin from the MetNet for
Cytoscape plugin page. Put the file, OmicsViz1.x.jar, into the folder called plugins in the Cytoscape directory
on your computer. The directory on a Windows computer is shown in Figure 2.
Figure 2 Location of the plugins and sampleData folders in the Cytoscape application.
1.3 Loading the OmicsViz Sample Data Files
Before starting the tutorial, please make sure that you have downloaded and installed the OmicsViz sample
data files from the MetNet plugin page. Unzip the file, OmicsVizsampledata.zip, into the folder called
sampleData in the Cytoscape directory on your computer. The sampleData directory on a Windows computer
is shown in Figure 2. The contents of the OmicsViz sample data file are listed in Table I. The tutorial assumes
that the MetNet Sample data is in the Cytoscape sampleData folder. Once the plugin and data have been added
to the proper folders, open the Cytoscape application.
Table I. Contents of the OmicsViz sample data folder
File Name
Acetyl-CoAbiotin.cys
Acetyl-CoAbiotinwithA
Texp.cys
File Type
Cytoscape session file
Cytoscape session file
Expression_AT4rmame
ans.txt
VV4_mean.txt
Expression Data
Mapping_AtProbesToA
tLoci.txt
Mapping file
Expression Data
Description
Acetyl-CoAbiotin pathway with
Acetyl-CoAbiotin pathway with
expression data attached for Omics
Vis tutorials.
A sample gene expression data file
of Arabidopsis from Plexdb.org.
A sample gene expression data file
of Vitis vinifera from Plexdb.org.
A sample mapping file which maps
between Arabidopsis Locus ID and
2
Mapping_VVProbesto
ATLoci.txt
Mapping file
MetabolitesMetMole.txt
Mapping file
Experiment EIE2.txt
Metabolomics data file.
Arabidopsis Probe ID
A sample mapping file which maps
between Arabidopsis Locus ID and
Grape Probe ID
A mapping file from MetNet ID to
Metabolite ID in experiment.
A sample metabolomics experiment
data set from Arabidopsis, available
from www.plantmetabolomics.org
2. Tutorial for Viewing Expression Data
1. Complete the instructions in sections 1.2 and 1.3 above before starting the tutorial to make
sure that the plugin and sample data are properly installed. The tutorial assumes that the MetNet
Sample data are in the Cytoscape sampleData folder.
2. Open the Cytoscape session file, Acetyl-CoAbiotin.cys by selecting File-> Open on the top
Cytoscape menu and selecting the file. This session file contains a pathway file and an associated
visual mapping file. Your screen should look like Figure 3.
Figure 3. Cytoscape window upon opening the Acetyl-CoAbiotin.cys file. This session file contains pathway
data, visual attributes for the graph, and graph layout information. Clicking on the tab labeled VizMapper in
the Control Panel will allow exploration of the node colors.
3
If you are not familiar with Cytoscape, explore the network by zooming in using the magnifying
glass icon. Also, try selecting alternative graph layouts on the Layout menu at the top. For more
details on the basic use of Cytoscape, please look at the online tutorials available at the
Cytoscape website. Information about individual nodes can be listed in the Data Panel in the
lower right hand corner of the Cytoscape window.
Right clicking on the icon circled in red in Figure 3 allows the user to select what node attributes
can be seen. For this tutorial, right click on the icon and select the location, nodeType, and show
boxes. This gives information about the subcellular location, the role of the node, and the name
of the node. Try highlighting a few nodes (they will turn pink) and looking at the information as
shown in Figure 4. Important note: Network data from different sources will have different
attributes and names. This information only holds for data from the MetNet Arabidopsis pathway
database.
Figure 4. Information about the selected nodes, highlighted in pink, is displayed in the Data Panel in the
lower left corner.
3. Go to the Plugins Menu in Cytoscape and select the OmicsViz option. Choose the “Name the
MappingID for Data File” option and select the Arabidopsis ProbeSet option. This option allows
the attribute mapping information to be labeled with a user specified ID.
4
Figure 5. OmicsViz plugin Window
5
4 The next step is selecting the input experiment data file. Click on the select button circled in
Figure 6 and browse the sampleData folder to find the experimental data file,
Expression_AT4rmameans.txt. Data files can be in the form of a tabbed text file or an excel
spreadsheet where each row of the file is a set of measurements of a probeset or compound and
each column is a hybridization or condition. The resulting window can be seen in
Figure 6. OmicsViz window before file selection
6
Figure 7. OmicsViz after the Experiment data file is selected. The ellipse highlights the next step of selecting
the mapping file.
5. Select the mapping file that maps Arabidopsis probe sets to Arabidopsis gene loci.
Select the mapping file, Mapping_AtProbesToAtLoci.txt, using the next selection box which is
circled in Figure 7. Error! Reference source not found. shows the resulting OmicsViz interface.
The mapping file allows a user to map between probe set names and locus IDs. This step is
necessary because node names in the graph do not always correspond to the names of nodes in
the experimental dataset.
Important notice: you should always keep the first column of the mapping file as the current
network node ID and the second as the ID the node maps to in the data file.
7
Figure 8. OmicsViz after the Mapping data file is selected. The ellipse highlights the next step of selecting the
mapping attribute.
6. Choose the current node attribute to associate with using “Choose An Attribute to Assign
Values to Node”. Set the value to “show” for this example as shown in Figure 9. This tells the
plugin what field in your graph file you should use to match the names. The dropdown box
circled in Figure 8 gives a list of attributes for the current graph being displayed.
Important notice: To match the node name using data files from MetNet, you should choose “show ”, other
pathway data sources may use different field names, please select the proper values.
8
Figure 9. OmicsViz with all input variables selected.
7. Once the proper fields are filled out, the Import button will be available. Click Import
and import your file into current network as shown in Figure 10. After your importation,
a result window will pop up specifying the number of nodes that matched those of the
current network and the number of columns of data which were read in. Figure 11 shows
the results of successfully importing a data file.
9
Figure 10. The import button is circled.
10
Figure 11. After the data is imported and mapped, The status window shows the number of genes that
matched nodes in the graph and the number of conditions that were in the experiment data file.
8. If anything goes wrong, click Cancel button to start again. The Reset button will reset all
values if you want to import another data file into the graph.
9. Click SelectedNodes button on the bottom left. To highlight the nodes in the graph, left click
on any row of data in the data panel. The nodes which have experimental data will be selected
(highlighted) as shown in Figure 12. The matched nodes will be selected and their node names will be
listed in the data panel. To highlight the nodes on the graph, simply select any node in the data panel (window
at lower right of cytoscape window).
Additional node information, such as the node name, expression values can be displayed in the data panel, by
right-clicking the mouse on the data panel icon and checking the boxes for the properties that you want to see
(Figure 13).
11
Figure 12. The Cytoscape display after node selection. The red circle shows the attribute list icon. To see
selected attributes in the data panel, left-click on this icon to get a list of all attributes. Check the boxes for the
attributes you want to see. The blue circle shows the mapping that was applied when the data was read into
the graph. In this case, the column lists the Arabidopsis probeset IDs.
12
Figure 13. Closeup view of highlighted nodes and their properties.
10. Click SavePvalsFile button to save the mapped data file as shown in Figure 14. This can
save time and allow for comparisons between different mappings. A pvals file for the current
network will be created in a directory you select. This file contains the expression data for the
matching nodes in the network.
13
Figure 14. The SavePvalsFile option is highlighted.
11. To look at the expression data in a parallel coordinate plot, click the button
“PlotSelectNodes” highlighted in Figure 15. Figure 16 plots the expression values of the
selected nodes over all the input columns.
14
Figure 15. The PlotSelectNodes button is highlighted. Clicking this button will display parallel coordinate
plots of all the selected graph nodes which have expression data associated with them.
Figure 16. Parallel coordinate plot of gene expression data. The horizontal axis shows the experimental
conditions, in this case, sets of time courses for different mutants.
Right clicking on the plot window opens up a window shown below that allows the user to
control the appearance of plot. Parameters such as color, axis ranges, and labels can be set.
15
12. Exploring the network in detail. Plot can be made which only display a few of the graph
nodes. To do this, select part of the pathway, then click the GraphVis button on the plugin.
Figure 17 shows an example where several peptides that form a protein complex have very
different expression levels.
Figure 17. Looking at expression levels for peptides in a single protein complex can show interesting details in
pathways.
16
13. Import multiple datasets
OmicsVis also provides function to manage imported multiple dataset through ”ImportedData”
Panel Figure 18.
When user import multiple datasets which may have different attributes shown in node attribute
panel and may associate with different networks, OmicsViz provides an efficient way to manage
the datasets. All the datasets belonging to the same network will be shown in ”ImportedData”
Panel. There are several operations:
a. If you want to show certain dataset in Cytoscape after you import it, you need to select
the dataset of interest in ”ImportedData” Panel (it will be consequently highlighted) and
click the “SelectExData” button. The suffix of “_Network_X” of each dataset tells you
which network the dataset belongs to. And the attributes belong to the dataset shown in
Cytoscape Node attribute panel will be automatically updated with respect to user
selection Figure 19.
b. If you want to switch another datasets for the same network, you need to deselect current
dataset through click “DeSelect” button then go through the same way as step a.
c. If you already switch to new network, you need to click “Reset” button to
refresh ”ImportedData” Panel to get the datasets which belong to the network. Then you
can start to manipulate specific dataset through step a, or b
Figure 18 an example of ”ImportedData” Panel which includes two datasets imported to current network and
one of datasets is selected and therefore highlighted.
17
Figure 19 automatical update of Cytoscape Node attribute panel with respect to user dataset selection. There
are two datasets for current network; When user switch between them, the Cytoscape Node attribute panel
keeps update according to user selection action to avoid confusion of dataset attribute due to large numbers
of new attributes imported by datasets, particularly gene expression data. Here
VV_Cont1_0h VV_Cont2_0hVV_Cont3_0h … … attributes belong dataset: VV1_RMA_new_salt.txt
Col-0-1 Col-0-2 Col-3-1 Col-3-2 Col-5-1…… attributes belong dataset: Expression_AT4rma.txt
18
14. Orthlog group
OmicsVis also provides homolog gene mapping function across species when you import dataset which is not
from species of current network. When there are multiple gene homologs which are mapped to one target gene,
we characterize them as a group. All mapped groups are shown in “Homolog Group” panel left next to
network layout panel. You can easily browse groups of interests through clicking their ID (see Figure18).
Figure20 An orthlog group is selected in Homolog Group panel and highlighted in yellow in network layout
panel. “show” attribute displays the probe set names of genes which are thought to be homologs of target
gene AT5G13440
3. Tutorial: Loading Metabolomics Data
The process of loading metabolomics data is very similar to that of loading expression data. The only
suggested change is that metabolite names from experiments be matched to the database IDs for data mapping.
1. Open the Cytoscape session file, Cuticularwax_pathway.cys by selecting File-> Open on the
top Cytoscape menu and selecting the file. This session file contains a pathway file and an
associated visual mapping file. Your screen should look like.
19
Cytoscape window upon opening the Cuticular_wax.cys file. This session file contains pathway data, visual
attributes for the graph, and graph layout information. Clicking on the tab labeled VizMapper in the Control
Panel will allow exploration of the node colors.
Right clicking on the icon circled in red in Figure 3 allows the user to select what node attributes
can be seen. For this tutorial, right click on the icon and select the location, nodeType, and show
boxes. This gives information about the subcellular location, the role of the node, and the name
of the node. Try highlighting a few nodes (they will turn pink) and looking at the information as
shown in Figure 4. Important note: Network data from different sources will have different
attributes and names. This information only holds for data from the MetNet Arabidopsis pathway
database.
3. Go to the Plugins Menu in Cytoscape and select the OmicsViz option. Fill out the fields in
omicsViz interface as follows: Choose the “Select Type of Experiment Data File” option menu
and select the Metabolomics Data option. Next, choose the “Name the MappingID for Data File”
option and select the Typing the Name option. Enter the name “Exp Metabolite”.
20
The next step is selecting the input experiment data file. Click on the select button and browse
the sampleData folder to find the experimental data file, M1_cuticular_means_ratios.txt. Data
files can be in the form of a tabbed text file or an excel spreadsheet where each row of the file is
a set of measurements of a probeset or compound and each column is a hybridization or
condition. Select the mapping file that maps Arabidopsis probe sets to Arabidopsis gene loci.
Select the mapping file, MetabolitesMetMole.txt, using the next selection box. Error! Reference
source not found. shows the resulting OmicsViz interface. The mapping file allows a user to
map between probe set names and locus IDs. This step is necessary because node names in the
graph do not always correspond to the names of nodes in the experimental dataset.
Important notice: you should always keep the first column of the mapping file as the current
network node ID and the second as the ID the node maps to in the data file.
Choose the current node attribute to associate with using “Choose An Attribute to Assign
Values to Node”. Set the value to “molID” for this example as shown in Figure 9. This tells the
plugin what field in your graph file you should use to match the names. The dropdown box
circled in Figure 8 gives a list of attributes for the current graph being displayed.
Important notice: To match the node name using data files from MetNet, you should choose “show ”, other
pathway data sources may use different field names, please select the proper values. In the Cytoscape
desktop, under the Node Attribute Browser, click on the Select Attributes
button.
OmicsViz for metabolomics data with all input variables selected.
7. Once the proper fields are filled out, the Import button will be available. Click Import
and import your file into current network as shown in Figure 10. After your importation,
a result window will pop up specifying the number of nodes that matched those of the
current network and the number of columns of data which were read in. Figure 11 shows
the results of successfully importing a data file.
21
4. Data File Formats
Omics data:
Gene expression values or other omics data values are specified over one or more
experiments using a text file. The file consists of a header and a number of space- or
tab-delimited fields, one line per gene, with the following format:
Identifier value1 value2 ... valueN
The first field identifies which Cytoscape graph node the data refers to. The remaining
values are the names of the experimental conditions. This file can be in text format or in
excel spreadsheet format.
Example:
1621489_at
1614168_at
1617502_s_at
Bud_LD
Shoot_LD
11.28406
10.84799667
12.05521333
11.60806
5.42482
5.37539
Bud_SD
11.40967
11.96729
4.90175
Shoot_SD
10.90334
11.37581
5.4227
Mapping File
The mapping file takes the form
Cytoscape Node Name Probe Set Name;
For example, when mapping the Affymetrix Vitis Vinifera chip to Arabidopsis gene loci
the file form is:
AT1G08520 1614655_at;
AT1G08540 1615847_at;
Note: Mappings are typically generated using various forms of reciprocal BLAST using
the probe set consensus or exemplar sequences. We have provided a set of mapping files
generated by our collaborators. A recognized problem occurs when multiple probe sets
map to the same gene locus. We are working to develop methods to display the output of
all possible matching probes.
22
Download