OmicsViz: Tutorial and usage 1. Introduction OmicsViz is a Cytoscape plugin (cytoscape2.4 and 2.5 and 2.6) dedicated to providing useful visualization and an integrated analysis tool for large-scale omics data. OmicsViz imports omics data into Cytoscape and visualizes it on a graph according to the change of gene experimental values (Figure 1). OmicsViz also provides a mapping function between two different species or between probe set and experimental names and node names in a network. For example, when you load an Arabidopsis metabolic pathway in Cytoscape and you have a Grape gene expression file, OmicsViz can associate the file with the Arabidopsis pathway based on gene mapping file which contains orthologous genes between the two species. Figure 1 Overview of the OmicsViz tool windows. 1.1 Structure of this Document Section 1. Information on getting started with the plugin and tutorial. Section 2. Tutorial for loading expression data from the same species and across species. Section 3. Tutorial on loading and view metabolomics data. (not complete) Section 4. Information on data file formats for loading your own data with the OmicsViz plugin. 1 1.2 Loading the OmicsViz Plugin Before starting the tutorial, please make sure that you have downloaded and installed a current version of Cytoscape (2.4 or higher) on your computer. Next, download the OmicsViz plugin from the MetNet for Cytoscape plugin page. Put the file, OmicsViz1.x.jar, into the folder called plugins in the Cytoscape directory on your computer. The directory on a Windows computer is shown in Figure 2. Figure 2 Location of the plugins and sampleData folders in the Cytoscape application. 1.3 Loading the OmicsViz Sample Data Files Before starting the tutorial, please make sure that you have downloaded and installed the OmicsViz sample data files from the MetNet plugin page. Unzip the file, OmicsVizsampledata.zip, into the folder called sampleData in the Cytoscape directory on your computer. The sampleData directory on a Windows computer is shown in Figure 2. The contents of the OmicsViz sample data file are listed in Table I. The tutorial assumes that the MetNet Sample data is in the Cytoscape sampleData folder. Once the plugin and data have been added to the proper folders, open the Cytoscape application. Table I. Contents of the OmicsViz sample data folder File Name Acetyl-CoAbiotin.cys Acetyl-CoAbiotinwithA Texp.cys File Type Cytoscape session file Cytoscape session file Expression_AT4rmame ans.txt VV4_mean.txt Expression Data Mapping_AtProbesToA tLoci.txt Mapping file Expression Data Description Acetyl-CoAbiotin pathway with Acetyl-CoAbiotin pathway with expression data attached for Omics Vis tutorials. A sample gene expression data file of Arabidopsis from Plexdb.org. A sample gene expression data file of Vitis vinifera from Plexdb.org. A sample mapping file which maps between Arabidopsis Locus ID and 2 Mapping_VVProbesto ATLoci.txt Mapping file MetabolitesMetMole.txt Mapping file Experiment EIE2.txt Metabolomics data file. Arabidopsis Probe ID A sample mapping file which maps between Arabidopsis Locus ID and Grape Probe ID A mapping file from MetNet ID to Metabolite ID in experiment. A sample metabolomics experiment data set from Arabidopsis, available from www.plantmetabolomics.org 2. Tutorial for Viewing Expression Data 1. Complete the instructions in sections 1.2 and 1.3 above before starting the tutorial to make sure that the plugin and sample data are properly installed. The tutorial assumes that the MetNet Sample data are in the Cytoscape sampleData folder. 2. Open the Cytoscape session file, Acetyl-CoAbiotin.cys by selecting File-> Open on the top Cytoscape menu and selecting the file. This session file contains a pathway file and an associated visual mapping file. Your screen should look like Figure 3. Figure 3. Cytoscape window upon opening the Acetyl-CoAbiotin.cys file. This session file contains pathway data, visual attributes for the graph, and graph layout information. Clicking on the tab labeled VizMapper in the Control Panel will allow exploration of the node colors. 3 If you are not familiar with Cytoscape, explore the network by zooming in using the magnifying glass icon. Also, try selecting alternative graph layouts on the Layout menu at the top. For more details on the basic use of Cytoscape, please look at the online tutorials available at the Cytoscape website. Information about individual nodes can be listed in the Data Panel in the lower right hand corner of the Cytoscape window. Right clicking on the icon circled in red in Figure 3 allows the user to select what node attributes can be seen. For this tutorial, right click on the icon and select the location, nodeType, and show boxes. This gives information about the subcellular location, the role of the node, and the name of the node. Try highlighting a few nodes (they will turn pink) and looking at the information as shown in Figure 4. Important note: Network data from different sources will have different attributes and names. This information only holds for data from the MetNet Arabidopsis pathway database. Figure 4. Information about the selected nodes, highlighted in pink, is displayed in the Data Panel in the lower left corner. 3. Go to the Plugins Menu in Cytoscape and select the OmicsViz option. Choose the “Name the MappingID for Data File” option and select the Arabidopsis ProbeSet option. This option allows the attribute mapping information to be labeled with a user specified ID. 4 Figure 5. OmicsViz plugin Window 5 4 The next step is selecting the input experiment data file. Click on the select button circled in Figure 6 and browse the sampleData folder to find the experimental data file, Expression_AT4rmameans.txt. Data files can be in the form of a tabbed text file or an excel spreadsheet where each row of the file is a set of measurements of a probeset or compound and each column is a hybridization or condition. The resulting window can be seen in Figure 6. OmicsViz window before file selection 6 Figure 7. OmicsViz after the Experiment data file is selected. The ellipse highlights the next step of selecting the mapping file. 5. Select the mapping file that maps Arabidopsis probe sets to Arabidopsis gene loci. Select the mapping file, Mapping_AtProbesToAtLoci.txt, using the next selection box which is circled in Figure 7. Error! Reference source not found. shows the resulting OmicsViz interface. The mapping file allows a user to map between probe set names and locus IDs. This step is necessary because node names in the graph do not always correspond to the names of nodes in the experimental dataset. Important notice: you should always keep the first column of the mapping file as the current network node ID and the second as the ID the node maps to in the data file. 7 Figure 8. OmicsViz after the Mapping data file is selected. The ellipse highlights the next step of selecting the mapping attribute. 6. Choose the current node attribute to associate with using “Choose An Attribute to Assign Values to Node”. Set the value to “show” for this example as shown in Figure 9. This tells the plugin what field in your graph file you should use to match the names. The dropdown box circled in Figure 8 gives a list of attributes for the current graph being displayed. Important notice: To match the node name using data files from MetNet, you should choose “show ”, other pathway data sources may use different field names, please select the proper values. 8 Figure 9. OmicsViz with all input variables selected. 7. Once the proper fields are filled out, the Import button will be available. Click Import and import your file into current network as shown in Figure 10. After your importation, a result window will pop up specifying the number of nodes that matched those of the current network and the number of columns of data which were read in. Figure 11 shows the results of successfully importing a data file. 9 Figure 10. The import button is circled. 10 Figure 11. After the data is imported and mapped, The status window shows the number of genes that matched nodes in the graph and the number of conditions that were in the experiment data file. 8. If anything goes wrong, click Cancel button to start again. The Reset button will reset all values if you want to import another data file into the graph. 9. Click SelectedNodes button on the bottom left. To highlight the nodes in the graph, left click on any row of data in the data panel. The nodes which have experimental data will be selected (highlighted) as shown in Figure 12. The matched nodes will be selected and their node names will be listed in the data panel. To highlight the nodes on the graph, simply select any node in the data panel (window at lower right of cytoscape window). Additional node information, such as the node name, expression values can be displayed in the data panel, by right-clicking the mouse on the data panel icon and checking the boxes for the properties that you want to see (Figure 13). 11 Figure 12. The Cytoscape display after node selection. The red circle shows the attribute list icon. To see selected attributes in the data panel, left-click on this icon to get a list of all attributes. Check the boxes for the attributes you want to see. The blue circle shows the mapping that was applied when the data was read into the graph. In this case, the column lists the Arabidopsis probeset IDs. 12 Figure 13. Closeup view of highlighted nodes and their properties. 10. Click SavePvalsFile button to save the mapped data file as shown in Figure 14. This can save time and allow for comparisons between different mappings. A pvals file for the current network will be created in a directory you select. This file contains the expression data for the matching nodes in the network. 13 Figure 14. The SavePvalsFile option is highlighted. 11. To look at the expression data in a parallel coordinate plot, click the button “PlotSelectNodes” highlighted in Figure 15. Figure 16 plots the expression values of the selected nodes over all the input columns. 14 Figure 15. The PlotSelectNodes button is highlighted. Clicking this button will display parallel coordinate plots of all the selected graph nodes which have expression data associated with them. Figure 16. Parallel coordinate plot of gene expression data. The horizontal axis shows the experimental conditions, in this case, sets of time courses for different mutants. Right clicking on the plot window opens up a window shown below that allows the user to control the appearance of plot. Parameters such as color, axis ranges, and labels can be set. 15 12. Exploring the network in detail. Plot can be made which only display a few of the graph nodes. To do this, select part of the pathway, then click the GraphVis button on the plugin. Figure 17 shows an example where several peptides that form a protein complex have very different expression levels. Figure 17. Looking at expression levels for peptides in a single protein complex can show interesting details in pathways. 16 13. Import multiple datasets OmicsVis also provides function to manage imported multiple dataset through ”ImportedData” Panel Figure 18. When user import multiple datasets which may have different attributes shown in node attribute panel and may associate with different networks, OmicsViz provides an efficient way to manage the datasets. All the datasets belonging to the same network will be shown in ”ImportedData” Panel. There are several operations: a. If you want to show certain dataset in Cytoscape after you import it, you need to select the dataset of interest in ”ImportedData” Panel (it will be consequently highlighted) and click the “SelectExData” button. The suffix of “_Network_X” of each dataset tells you which network the dataset belongs to. And the attributes belong to the dataset shown in Cytoscape Node attribute panel will be automatically updated with respect to user selection Figure 19. b. If you want to switch another datasets for the same network, you need to deselect current dataset through click “DeSelect” button then go through the same way as step a. c. If you already switch to new network, you need to click “Reset” button to refresh ”ImportedData” Panel to get the datasets which belong to the network. Then you can start to manipulate specific dataset through step a, or b Figure 18 an example of ”ImportedData” Panel which includes two datasets imported to current network and one of datasets is selected and therefore highlighted. 17 Figure 19 automatical update of Cytoscape Node attribute panel with respect to user dataset selection. There are two datasets for current network; When user switch between them, the Cytoscape Node attribute panel keeps update according to user selection action to avoid confusion of dataset attribute due to large numbers of new attributes imported by datasets, particularly gene expression data. Here VV_Cont1_0h VV_Cont2_0hVV_Cont3_0h … … attributes belong dataset: VV1_RMA_new_salt.txt Col-0-1 Col-0-2 Col-3-1 Col-3-2 Col-5-1…… attributes belong dataset: Expression_AT4rma.txt 18 14. Orthlog group OmicsVis also provides homolog gene mapping function across species when you import dataset which is not from species of current network. When there are multiple gene homologs which are mapped to one target gene, we characterize them as a group. All mapped groups are shown in “Homolog Group” panel left next to network layout panel. You can easily browse groups of interests through clicking their ID (see Figure18). Figure20 An orthlog group is selected in Homolog Group panel and highlighted in yellow in network layout panel. “show” attribute displays the probe set names of genes which are thought to be homologs of target gene AT5G13440 3. Tutorial: Loading Metabolomics Data The process of loading metabolomics data is very similar to that of loading expression data. The only suggested change is that metabolite names from experiments be matched to the database IDs for data mapping. 1. Open the Cytoscape session file, Cuticularwax_pathway.cys by selecting File-> Open on the top Cytoscape menu and selecting the file. This session file contains a pathway file and an associated visual mapping file. Your screen should look like. 19 Cytoscape window upon opening the Cuticular_wax.cys file. This session file contains pathway data, visual attributes for the graph, and graph layout information. Clicking on the tab labeled VizMapper in the Control Panel will allow exploration of the node colors. Right clicking on the icon circled in red in Figure 3 allows the user to select what node attributes can be seen. For this tutorial, right click on the icon and select the location, nodeType, and show boxes. This gives information about the subcellular location, the role of the node, and the name of the node. Try highlighting a few nodes (they will turn pink) and looking at the information as shown in Figure 4. Important note: Network data from different sources will have different attributes and names. This information only holds for data from the MetNet Arabidopsis pathway database. 3. Go to the Plugins Menu in Cytoscape and select the OmicsViz option. Fill out the fields in omicsViz interface as follows: Choose the “Select Type of Experiment Data File” option menu and select the Metabolomics Data option. Next, choose the “Name the MappingID for Data File” option and select the Typing the Name option. Enter the name “Exp Metabolite”. 20 The next step is selecting the input experiment data file. Click on the select button and browse the sampleData folder to find the experimental data file, M1_cuticular_means_ratios.txt. Data files can be in the form of a tabbed text file or an excel spreadsheet where each row of the file is a set of measurements of a probeset or compound and each column is a hybridization or condition. Select the mapping file that maps Arabidopsis probe sets to Arabidopsis gene loci. Select the mapping file, MetabolitesMetMole.txt, using the next selection box. Error! Reference source not found. shows the resulting OmicsViz interface. The mapping file allows a user to map between probe set names and locus IDs. This step is necessary because node names in the graph do not always correspond to the names of nodes in the experimental dataset. Important notice: you should always keep the first column of the mapping file as the current network node ID and the second as the ID the node maps to in the data file. Choose the current node attribute to associate with using “Choose An Attribute to Assign Values to Node”. Set the value to “molID” for this example as shown in Figure 9. This tells the plugin what field in your graph file you should use to match the names. The dropdown box circled in Figure 8 gives a list of attributes for the current graph being displayed. Important notice: To match the node name using data files from MetNet, you should choose “show ”, other pathway data sources may use different field names, please select the proper values. In the Cytoscape desktop, under the Node Attribute Browser, click on the Select Attributes button. OmicsViz for metabolomics data with all input variables selected. 7. Once the proper fields are filled out, the Import button will be available. Click Import and import your file into current network as shown in Figure 10. After your importation, a result window will pop up specifying the number of nodes that matched those of the current network and the number of columns of data which were read in. Figure 11 shows the results of successfully importing a data file. 21 4. Data File Formats Omics data: Gene expression values or other omics data values are specified over one or more experiments using a text file. The file consists of a header and a number of space- or tab-delimited fields, one line per gene, with the following format: Identifier value1 value2 ... valueN The first field identifies which Cytoscape graph node the data refers to. The remaining values are the names of the experimental conditions. This file can be in text format or in excel spreadsheet format. Example: 1621489_at 1614168_at 1617502_s_at Bud_LD Shoot_LD 11.28406 10.84799667 12.05521333 11.60806 5.42482 5.37539 Bud_SD 11.40967 11.96729 4.90175 Shoot_SD 10.90334 11.37581 5.4227 Mapping File The mapping file takes the form Cytoscape Node Name Probe Set Name; For example, when mapping the Affymetrix Vitis Vinifera chip to Arabidopsis gene loci the file form is: AT1G08520 1614655_at; AT1G08540 1615847_at; Note: Mappings are typically generated using various forms of reciprocal BLAST using the probe set consensus or exemplar sequences. We have provided a set of mapping files generated by our collaborators. A recognized problem occurs when multiple probe sets map to the same gene locus. We are working to develop methods to display the output of all possible matching probes. 22