Visualizing multidimensional cancer genomics data Michael P Schroeder1, Abel Gonzalez-Perez1 and Nuria Lopez-Bigas*1,2 Research Program on Biomedical Informatics – GRIB, Universitat Pompeu Fabra (UPF), Parc de Recerca Biomèdica de Barcelona (PRBB), Dr. Aiguader 88, E-08003 Barcelona, Spain. 2 Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain. 1 *Correspondence: nuria.lopez@upf.edu This additional file describes how to produce the graphs shown in Figure 3, which are explained in the case studies. Several data files needed to generate the figures are provided together with this document. Figure 3a. Visualizing cancer drivers and cause–effect relationship in a heatmap As explained in case study ‘Visual exploration of cancer drivers’, the mutually exclusive alteration pattern of genes within a pathway is characteristic of cancer drivers. Thus, it is helpful to visualize all the alterations recorded for the genes within a pathway across tumor samples sorted by their mutual exclusivity. In Gitools, it is possible to load oncogenomics data from a cohort of tumor samples and to visualize them as a matrix heatmap (follow these tutorials http://help.gitools.org/Tutorials/#HCASESTUDY6:StudyingmultidimensionalcancerdatawithGitools for further help). The tdm file format can contain various values per cell, which makes it suitable to represent different somatic alterations of genes across samples. These might include under- or overexpression, copy number alterations or mutations, as shown in the Figure below, which illustrated the data in a tdm file as a spreadsheet. Visualize mutual exclusivity alteration patterns Follow the steps below to visualize alteration data by mutual exclusivity. (The video tutorial at http://help.gitools.org/Tutorials/Tutorial62 contains further help.) ● Open Gitools and click on open data in a heatmap in the welcome tab. ● ● ● ● ● ● Select to open the sample data file tcga-gbm.tp53.pathway.tdm.gz (To visualize all files in the Open window dialog select ‘ Multivalue data matrix (*.*)’ from the ‘Files of Type’ pulldown menu.) After the data file is open in Gitools, select at the left the tab Properties Rows. In this tab, select the Load … button in the Annotations section. Select to open the annotations file ensg.ensemblv62.tsv.gz, which contains further information about the genes, such as gene symbol and locus information. After loading click on Add in the Headers section. Choose to add a Text Label header and click OK. In the next window, select the gene symbol sym and click Next and Finish. A new row header is visible in the heatmap showing the gene symbols. The row header called Text: ID can be removed. Now that the alterations and gene annotations are loaded, open the menu Data Sort Sort by mutual exclusion. In the dialog, select Rows and click on the Change button and select the sym and type in the text box below the following four genes, each on a new row: TP53, MDM2, MDM4 and CDKN2A. Gitools will sort these four selected genes by a mutual exclusivity at the top of the matrix. In Gitools you can change the size of the cells. In this case, we recommend that you resize the width of the cells to 1 or 2 pixels to see the full matrix. Select at the left the tab Properties Rows and change the width of the cell to 1 or 2 pixels. Visualize cause–effect relationship After having completed the steps above, read the description below in order to visualize the cause–effect relationship between somatic genomic alterations and gene expression. (The video tutorial at http://help.gitools.org/Tutorials/Tutorial63 contains further help.) The previously loaded file tcga-gbm.tp53.pathway.tdm.gz also contains data on the expression of the same genes obtained from the same tumor samples. By following the steps explained above to visualize genomic alteration data in a mutually exclusive manner, tumor samples with somatic genomic alterations in the same gene have been grouped together. ● ● ● ● First, to change the value visualized in the heatmap from genomic alterations to expression, select at the left the panel Properties Cells. In the Scale section find the drop-down field for Value, which currently should display Alteration. Click on the drop-down menu and select the value expr median-centered. Now it is possible to adjust the scale. We recommend that you set the center color to white, so that subtle differences in expression values can be seen with greater detail. Since the samples appear in the same order as before, it is possible to observe that the expression value is very low in the samples where a gene is lost. The opposite is the case of gain events. To assess this difference in expression value statistically, follow the instructions in the video tutorial mentioned above. Figure 3b. Visualizing cancer-driver genes in a network The cBio Cancer Genomics Portal allows the browsing of datasets of cancer genomics alterations generated by TCGA. After selecting a set of genes, the user can visualize the interactions between them using a network browser. ● Direct yourself in your favorite browser towards the website http://www.cbioportal.org/public-portal/ ● In the form displayed, select the data as follows and select Submit at the bottom. ○ Cancer Study: Glioblastoma (TCGA, Nature 2008) ○ Gene set: TP53, MDM2, MDM4 and CDKN2A (each on a separate line) ● The Summary tab will be displayed by default with the OncoPrint alterations heatmap. Then, the network viewer can be accessed through the tab Network. Selected genes will be displayed as nodes, together with their closest neighbors. When a node is clicked upon, the percentages of samples in which it is altered by copy number alteration, mutations or over- and underexpression are shown surrounding the node. Keep Shift pressed down to select multiple nodes. For more information, click Legends Gene Legend above the network. The depth of neighbors displayed can be adjusted by changing the bar to the right labeled with: Filter Neighbors by Alteration %. (Further information is available at http://www.slideshare.net/EthanCerami/network-view.) Figures 3c, 3d and 3e. Visualizing the stratification of cancer patients The case study ‘Visualizing cancer patient stratifications’ illustrates how to visualize the stratification of cancer patients. Follow the video tutorial http://help.gitools.org/Tutorials/Tutorial64 for further information. Matrix heatmap ● ● ● ● ● ● ● Open Gitools and click on open data in a heatmap in the welcome tab. Select glioblastoma-slea-kegg-results.tdm.gz to open the results data file. The tutorial http://help.gitools.org/Tutorials/Tutorial64 explains how to perform sample level enrichment analysis (SLE) with Gitools to generate these results. In the open heatmap, downregulated pathways appear blue-shifted, whereas upregulated pathways appear red-shifted. Now, select in the left tab Properties Columns. In this tab, select the Load … button in the Annotations section. Then, open the sample annotation file glioblastoma-sample-annotation.txt, which contains clinical information from the samples such as molecular subtype. After loading the annotations, click on Add in the Headers section. Select adding Colored labels from annotation and hit OK. From the fields listed in the annotation options choose subtype and click Next. In the next dialog window, enable Show cluster names and click on Finish. From the top menus, select Data Sort by label and just select to sort the columns (the samples) by the label subtype, which is accessible through the change button on the label row. To better visualize the sorted data, change the options in Properties Cells as follows: ● ● ● ● Cells width: 3 Cells height: 14 Show rows grid: true Show columns grid: false Circular heatmap We have used CircleMap, which is a command line tool. A web server for it is being created at http://sysbio.soe.ucsc.edu/nets. To run it on your computer you may download a zip archive with the source code from GitHub: https://github.com/chkw/stuartlab-circleplotter-py. Follow the instructions in the README file to generate the circular heatmaps examples. The second example creates the nodes from the figure in the format of .png files. Genome Browser If the sample information is loaded into the genome browser IGV, it is possible to stratify the cancer patients and observe closer particular groups and genes. ● ● ● ● ● ● ● Start up IGV and click File Load from Server... In the pop-up dialog, load the Tutorial data. Then, unfold the options Tutorials GBM Subtypes (Verhaak et al.) and select the boxes Sample Information, Segmented Copy Number and Expression and click OK The data will be loaded from the Server into the IGV. When ready, select a gene of interest and type it into the box at the top, and hit enter. For the example, we have selected EGFR. We see now the values for all the tracks for EGFR; some contain copy number data and others contain expression data. To stratify the different cancer samples, select the menu Tracks Group Tracks Subtype. This will group together all the tracks from the same molecular subtype To better visualize all the data, reduce the track height by selecting: Tracks set Track height 1 pixel Define a region of interest by selecting the Define a region of interest icon on the top, select a left and a right boundary by clicking into tracks in the EGFR region twice on the extremes. You may find further help at http://www.broadinstitute.org/igv/regionsofinterest A red bar will appear just above the first track. By making a right click into this red bar, different ways to sort the information show up. Select sort by amplification and you will see that the copy number tracks are sorted by their amplification status and the expression tracks that come from the same samples are sorted in the same way. Figure 3f. Visualizing global alteration profile patterns The circular genome plots shown in Figure 3f were taken from the Cosmic web portal. The three samples shown are: ● Many alterations: http://cancer.sanger.ac.uk/cosmic/sample/overview?id=749716 ● Few alterations: http://cancer.sanger.ac.uk/cosmic/sample/overview?id=1126516 ● Chromothripsis on chr 17 http://cancer.sanger.ac.uk/cosmic/sample/overview?id=749716 As stated in the portal, the figures have been plotted with Circos. Circos is a very flexible command line tool. Owing to this characteristic, it requires the investment of some time on the user’s part to master it. We limit ourselves to pointing the reader to the extensive reading and learning material for Circos at its official website: http://circos.ca/tutorials/lessons/.