file - Genome Medicine

advertisement
Visualizing multidimensional cancer genomics data
Michael P Schroeder1, Abel Gonzalez-Perez1 and Nuria Lopez-Bigas*1,2
Research Program on Biomedical Informatics – GRIB, Universitat Pompeu Fabra (UPF),
Parc de Recerca Biomèdica de Barcelona (PRBB), Dr. Aiguader 88, E-08003 Barcelona, Spain.
2
Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
1
*Correspondence:
nuria.lopez@upf.edu
This additional file describes how to produce the graphs shown in Figure 3, which are explained
in the case studies. Several data files needed to generate the figures are provided together with
this document.
Figure 3a. Visualizing cancer drivers and cause–effect relationship in
a heatmap
As explained in case study ‘Visual exploration of cancer drivers’, the mutually exclusive
alteration pattern of genes within a pathway is characteristic of cancer drivers. Thus, it is helpful
to visualize all the alterations recorded for the genes within a pathway across tumor samples
sorted by their mutual exclusivity. In Gitools, it is possible to load oncogenomics data from a
cohort of tumor samples and to visualize them as a matrix heatmap (follow these tutorials
http://help.gitools.org/Tutorials/#HCASESTUDY6:StudyingmultidimensionalcancerdatawithGitools for further help). The tdm file format can contain various
values per cell, which makes it suitable to represent different somatic alterations of genes
across samples. These might include under- or overexpression, copy number alterations or
mutations, as shown in the Figure below, which illustrated the data in a tdm file as a
spreadsheet.
Visualize mutual exclusivity alteration patterns
Follow the steps below to visualize alteration data by mutual exclusivity. (The video tutorial at
http://help.gitools.org/Tutorials/Tutorial62 contains further help.)
● Open Gitools and click on open data in a heatmap in the welcome tab.
●
●
●
●
●
●
Select to open the sample data file tcga-gbm.tp53.pathway.tdm.gz (To visualize all files
in the Open window dialog select ‘ Multivalue data matrix (*.*)’ from the ‘Files of Type’
pulldown menu.)
After the data file is open in Gitools, select at the left the tab Properties  Rows. In this
tab, select the Load … button in the Annotations section. Select to open the annotations
file ensg.ensemblv62.tsv.gz, which contains further information about the genes, such
as gene symbol and locus information.
After loading click on Add in the Headers section. Choose to add a Text Label header
and click OK. In the next window, select the gene symbol sym and click Next and
Finish. A new row header is visible in the heatmap showing the gene symbols. The row
header called Text: ID can be removed.
Now that the alterations and gene annotations are loaded, open the menu Data  Sort
 Sort by mutual exclusion. In the dialog, select Rows and click on the Change button
and select the sym and type in the text box below the following four genes, each on a
new row: TP53, MDM2, MDM4 and CDKN2A.
Gitools will sort these four selected genes by a mutual exclusivity at the top of the
matrix.
In Gitools you can change the size of the cells. In this case, we recommend that you resize the width of the cells to 1 or 2 pixels to see the full matrix. Select at the left the tab
Properties  Rows and change the width of the cell to 1 or 2 pixels.
Visualize cause–effect relationship
After having completed the steps above, read the description below in order to visualize the
cause–effect relationship between somatic genomic alterations and gene expression. (The
video tutorial at http://help.gitools.org/Tutorials/Tutorial63 contains further help.)
The previously loaded file tcga-gbm.tp53.pathway.tdm.gz also contains data on the expression
of the same genes obtained from the same tumor samples. By following the steps explained
above to visualize genomic alteration data in a mutually exclusive manner, tumor samples with
somatic genomic alterations in the same gene have been grouped together.
●
●
●
●
First, to change the value visualized in the heatmap from genomic alterations to
expression, select at the left the panel Properties  Cells.
In the Scale section find the drop-down field for Value, which currently should display
Alteration. Click on the drop-down menu and select the value expr median-centered.
Now it is possible to adjust the scale. We recommend that you set the center color to
white, so that subtle differences in expression values can be seen with greater detail.
Since the samples appear in the same order as before, it is possible to observe that the
expression value is very low in the samples where a gene is lost. The opposite is the
case of gain events.
To assess this difference in expression value statistically, follow the instructions in the video
tutorial mentioned above.
Figure 3b. Visualizing cancer-driver genes in a network
The cBio Cancer Genomics Portal allows the browsing of datasets of cancer genomics
alterations generated by TCGA. After selecting a set of genes, the user can visualize the
interactions between them using a network browser.
●
Direct yourself in your favorite browser towards the website
http://www.cbioportal.org/public-portal/
● In the form displayed, select the data as follows and select Submit at the bottom.
○ Cancer Study: Glioblastoma (TCGA, Nature 2008)
○ Gene set: TP53, MDM2, MDM4 and CDKN2A (each on a separate line)
● The Summary tab will be displayed by default with the OncoPrint alterations heatmap.
Then, the network viewer can be accessed through the tab Network.
Selected genes will be displayed as nodes, together with their closest neighbors. When a node
is clicked upon, the percentages of samples in which it is altered by copy number alteration,
mutations or over- and underexpression are shown surrounding the node. Keep Shift pressed
down to select multiple nodes. For more information, click Legends  Gene Legend above the
network. The depth of neighbors displayed can be adjusted by changing the bar to the right
labeled with: Filter Neighbors by Alteration %.
(Further information is available at http://www.slideshare.net/EthanCerami/network-view.)
Figures 3c, 3d and 3e. Visualizing the stratification of cancer patients
The case study ‘Visualizing cancer patient stratifications’ illustrates how to visualize the
stratification of cancer patients. Follow the video tutorial
http://help.gitools.org/Tutorials/Tutorial64 for further information.
Matrix heatmap
●
●
●
●
●
●
●
Open Gitools and click on open data in a heatmap in the welcome tab.
Select glioblastoma-slea-kegg-results.tdm.gz to open the results data file. The tutorial
http://help.gitools.org/Tutorials/Tutorial64 explains how to perform sample level
enrichment analysis (SLE) with Gitools to generate these results.
In the open heatmap, downregulated pathways appear blue-shifted, whereas
upregulated pathways appear red-shifted. Now, select in the left tab Properties 
Columns. In this tab, select the Load … button in the Annotations section. Then, open
the sample annotation file glioblastoma-sample-annotation.txt, which contains clinical
information from the samples such as molecular subtype.
After loading the annotations, click on Add in the Headers section. Select adding
Colored labels from annotation and hit OK.
From the fields listed in the annotation options choose subtype and click Next.
In the next dialog window, enable Show cluster names and click on Finish.
From the top menus, select Data  Sort  by label and just select to sort the columns
(the samples) by the label subtype, which is accessible through the change button on
the label row.
To better visualize the sorted data, change the options in Properties  Cells as follows:
●
●
●
●
Cells width: 3
Cells height: 14
Show rows grid: true
Show columns grid: false
Circular heatmap
We have used CircleMap, which is a command line tool. A web server for it is being created at
http://sysbio.soe.ucsc.edu/nets.
To run it on your computer you may download a zip archive with the source code from GitHub:
https://github.com/chkw/stuartlab-circleplotter-py.
Follow the instructions in the README file to generate the circular heatmaps examples. The
second example creates the nodes from the figure in the format of .png files.
Genome Browser
If the sample information is loaded into the genome browser IGV, it is possible to stratify the
cancer patients and observe closer particular groups and genes.
●
●
●
●
●
●
●
Start up IGV and click File  Load from Server...
In the pop-up dialog, load the Tutorial data. Then, unfold the options Tutorials  GBM
Subtypes (Verhaak et al.) and select the boxes Sample Information, Segmented Copy
Number and Expression and click OK
The data will be loaded from the Server into the IGV. When ready, select a gene of
interest and type it into the box at the top, and hit enter. For the example, we have
selected EGFR.
We see now the values for all the tracks for EGFR; some contain copy number data and
others contain expression data. To stratify the different cancer samples, select the
menu Tracks  Group Tracks  Subtype. This will group together all the tracks from
the same molecular subtype
To better visualize all the data, reduce the track height by selecting: Tracks  set Track
height  1 pixel
Define a region of interest by selecting the Define a region of interest icon on the top,
select a left and a right boundary by clicking into tracks in the EGFR region twice on the
extremes. You may find further help at
http://www.broadinstitute.org/igv/regionsofinterest
A red bar will appear just above the first track. By making a right click into this red bar,
different ways to sort the information show up. Select sort by amplification and you will
see that the copy number tracks are sorted by their amplification status and the
expression tracks that come from the same samples are sorted in the same way.
Figure 3f. Visualizing global alteration profile patterns
The circular genome plots shown in Figure 3f were taken from the Cosmic web portal. The
three samples shown are:
● Many alterations: http://cancer.sanger.ac.uk/cosmic/sample/overview?id=749716
● Few alterations: http://cancer.sanger.ac.uk/cosmic/sample/overview?id=1126516
●
Chromothripsis on chr 17
http://cancer.sanger.ac.uk/cosmic/sample/overview?id=749716
As stated in the portal, the figures have been plotted with Circos. Circos is a very flexible
command line tool. Owing to this characteristic, it requires the investment of some time on the
user’s part to master it. We limit ourselves to pointing the reader to the extensive reading and
learning material for Circos at its official website: http://circos.ca/tutorials/lessons/.
Download