Supplement – Materials and methods 1. Bladder tumour materials FFPE archival UBC specimens obtained following TURBT were randomly selected from 2005 and 2010 out of the departmental database (University Hospitals Network, Toronto and Mount Sinai Hospital, Toronto). To compare FF and FFPE, we used 4 bladder tumour samples from 4 patients. For our cohort, we analyzed an additional 49 tumours from 47 patients, among which, 7 patients had multiple samples analyzed for the study inter-tumour heterogeneity (Figure 1c-e). The study was carried out with institutional ethics board approval from both institutions (UHN: 11-0134-T and MSH: 11-0015-E). The oldest direct comparison between FF and FFPE was performed on a sample from 2007 (embedded in FFPE for 6 years). The oldest FFPE RNAseq was performed on samples from 2005 (embedded in FFPE for 6 years). Two years (2005 and 2010) were chosen to ensure quality control with older samples and samples passed quality control set for RNAseq analysis (see RNAseq quality control section). Cases were reviewed pathologically to confirm that they were unequivocally low or high grade cases and suitable for RNA extraction based on size of tumour and purity of tumour e.g. absence of haemorrhage, divergent pathology, and necrosis or inflammation. For each FFPE block/section, we extracted a minimum of 750ng total RNA, and the smallest tumour sample we analyzed had a volume of approximately 0.8mm3. 2. Tissue sectioning and RNA collection The FFPE blocks were serially sectioned and the first section (5µm subsequent sections 10µm) was stained (H & E) and assessed by a pathologist (TvdK) for tumour grade and stage. 1 Viable tumour regions were then evaluated and marked before total RNA was collected and extracted. 3. RNA extraction Total RNA was extracted using RNeasy kit (QIAGEN). The yield and quality of total RNA was then assessed using Nanodrop (Thermo Fisher) and BioAnalyzer (Agilent), respectively. Ribosomal RNA (rRNAs) was removed using a bead-based hybridization kit, RiboZero (Epicentre, EPI-MRZG12324SP), cDNA libraries were prepared using the Illumina TruSeq RNA sample preparation kit v2 (RS-122-2001) and then loaded as 2 indexed samples per lane on an Illumina HiSeq 2000. 4. wRNASeq protocols and analytical pipeline The raw sequencing reads in fastq format were obtained from the image files produced by the HiSeq2000 using the standard Illumina CASAVA software (version 1.8). For each sample, around 150 million 101bp paired-end reads were generated in the fastq format and mapped onto the human genome (hg19) using Tophat1.4.1, allowing for up to two mismatches 1. After obtaining aligned BAM files, a custom Perl script was used to select unique reads that mapped to only one location on the genome. Reads mapped to multiple sites on the genome were discarded. These filtered bam files were then analyzed using a custom Rbased pipeline to calculate gene expression profiles using ENSEMBL annotation for coding genes and ENCODE annotations for lincRNAs2. To estimate the expression levels of each gene, including both coding and non-coding genes, the number of reads mapped onto the gene was counted regardless of transcription isoforms and normalized to total mapped reads to obtain transcript union Read Per Million total reads (truRPMs). For coding genes, 2 reads mapped onto both exons and introns were all counted for truRPM calculations using a custom R script. 5. RNAseq quality control The quality of total RNA extracted from FFPE sections was examined using Nanodrop (Thermo Fisher) and all the RNA samples have OD260/OD280 >2 indicating high purity of RNA. Unlike total RNA extracted from fresh frozen samples, RNA molecules extracted from FFPE are highly fragmented, therefore, we did not need to perform the fragmentation step during the generation of cDNA sequencing libraries. For RNA extracted from fresh frozen samples, we followed the Illumina protocol to fragment the RNA. This is a key modification of the protocol. More importantly, efficient removal of rRNA is another critical factor for successful cDNA library construction from FFPE RNA samples. The efficiency of rRNA removal and was determined by calculating the ratio of GAPDH RNA to S18 rRNA using Taqman qPCR. The Taqman probes for human GAPDH (Cat.4333764) and S18 (Cat. 4333760) were obtained from Life Technologies. All cDNA libraries displayed successful removal of rRNA (GAPDH:rRNA>1) indicating high quality. The quality of sequencing reads was assessed using a publically available tool, FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). All samples that passed the quality control measurements of FastQC were then subjected to subsequent analysis. Poor quality 5’ or 3’ end reads were trimmed using a custom Perl script before mapping onto the human genome and no sample contained more than 10% rRNA. The storage time of all the FFEP samples had no effects on the RNAseq data quality based on the above standards. 6. Statistical analyses 3 For heatmap and statistical analyses, publically available R packages were used, while network presentations were prepared in Cytoscape. Analysis of network modularity was performed as reported previously3. Briefly, after removal of high and low abundance genes, expression levels of coding genes were log transformed and median centered, and the average correlation of co-expression of a hub with its interacting partners across LG samples was calculated and compared with that across the HG cluster. The hubs with altered correlation between LG and HG were selected and the significance was assessed using permutation. In total, 392 significant hubs (p<0.05) were identified. Of these, 302 hubs showed interactions and were used to construct the network in Fig 2c. 4 Supplementary Results Comparison with other similar studies using larger cohorts. We compared our results with the Lindgren et al study4 that used microarrays to examine expression levels of 2508 coding genes between high grade and low grade bladder tumours. These authors identified 392 and 400 coding genes upregulated in high grade and low grade tumours respectively. Compared with our DEGs, 80 and 32 coding genes overlapped with high grade and low grade associated genes, respectively. Interestingly, the overlapped genes have more dynamic fold changes in our data than theirs (Fig. S2a and S2b, the boxplots in the left panels). For the genes that only appear in the list by Lingren et al 4, our RNAseq data have similar distribution as the microarray data (Fig. S2a and S2b, the boxplots in the right panels); however, due to our strict statistical standards, these genes are not included in our DEGs. When comparing our results to another similar study5, among our 947 DEGs, 241 were also identified by microarray profiling of fresh frozen invasive (T1-T4) versus non-invasive tumours (Ta). In summary, our results showed good concordance with previous studies and in keeping with the known limitations when comparing different datasets of this nature6. 5 Supplementary References 1. 2. 3. 4. 5. 6. Trapnell, C., Pachter, L. & Salzberg, S.L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105-1111 (2009). Djebali, S. et al. Landscape of transcription in human cells. Nature 489, 101-108. Taylor, I.W. et al. Dynamic modularity in protein interaction networks predicts breast cancer outcome. Nat Biotechnol 27, 199-204 (2009). Lindgren, D. et al. Combined gene expression and genomic profiling define two intrinsic molecular subtypes of urothelial carcinoma and gene signatures for molecular grading and outcome. Cancer research 70, 3463-3472 (2010). He, X. et al. Differentiation of a highly tumorigenic basal cell compartment in urothelial carcinoma. Stem Cells 27, 1487-1495 (2009). Lauss, M., Ringner, M. & Hoglund, M. Prediction of stage, grade, and survival in bladder cancer using genome-wide expression data: a validation study. Clinical cancer research : an official journal of the American Association for Cancer Research 16, 4421-4433 (2010). 6