Supplementary Information Histone modifications are associated with transcript isoform diversity in normal and cancer cells Ondrej Podlaha1, Subhajyoti De2,3,4, Mithat Gonen5, and Franziska Michor1* 1Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, and Department of Biostatistics, Harvard School of Public Health, Boston, MA 02215, USA. 2Department of Medicine, University of Colorado School of Medicine, Aurora, CO 80045, USA. 3Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO 80045, USA. 4Molecular Oncology Program, University of Colorado Cancer Center, Aurora, CO 80045, USA. 5Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, NY 10065, USA. *Author for correspondence. Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Avenue, Boston, MA 02215, USA. Tel: 617 643 5045. Fax: 617 632 2444. Email: michor@jimmy.harvard.edu. *Author for correspondence. 0.19414 0.20887 0.21417 0.20439 0.22092 0.2197 0.20944 0.1689 0.18625 0.19439 0.19874 0.19932 0.19623 0.18684 0.18261 0.19586 0.19913 0.196 0.18104 0.1809 0.18183 0.06013 0.02652 -0.0103 -0.0042 0.03215 0.0492 0.06678 0.18245 0.19155 0.20042 0.1997 0.18191 0.18299 0.17865 -0.0931 -0.0849 -0.0807 -0.0637 -0.067 -0.0749 -0.077 0.18285 0.17339 0.1458 0.11141 0.17047 0.18304 0.18827 0.10298 0.13896 0.15884 0.15922 0.15337 0.14385 0.1176 Ct az H2 H3 0.13219 0.06034 0.01543 0.00439 0.05415 0.09123 0.13604 cf e2 e3 k7 9m e3 k3 6m H3 k2 7m H3 0.0018 -0.0333 -0.0551 -0.0508 -0.053 -0.0506 -0.0458 k2 7a H3 -0.0936 -0.0884 -0.0834 -0.07 -0.0709 -0.0798 -0.0825 c e1 k2 0m e3 k9 m e1 k9 m H3 k9 ac e3 H3 k4 m e2 H3 k4 m H3 k4 m H3 0.17302 0.1763 0.15372 0.12824 0.1946 0.19834 0.18888 H4 Average of Fisher Transformed Spearman rank correlations for normal cells Distance from exon -5kb -2kb -1kb 0kb +1kb +2kb +5kb H3 lincRNA TSSIR e1 Table S1. Association between the transcription start site inclusion rate (TSSIR) of lincRNAs and histone modification enrichment in normal cell lines. We analyzed six normal human cell lines (Gm12878, Hsmm, Huvec, H1hesc, Nhek, Nhlf) for the associations between the transcription start site inclusion rate (TSSIR) and histone modification enrichment for lincRNAs. Values represent the average of Fisher transformed Spearman rank correlations to enable direct comparison. Coefficients are color-coded, with red representing increasingly negative and green representing increasingly positive correlation. Distance from exon categories signifies a region relative to a given exon where histone enrichment was measured. 0kb represents a region within given exon boundaries, and 1kb, 2kb, and 5kb signify regions from the exon boundary either upstream (negative) or downstream (positive). 0.10297 0.11261 0.11568 0.08318 0.08444 0.07837 0.06618 Ct cf H2 az e2 e3 H3 k7 9m H3 k3 6m e3 H3 k2 7m H3 k2 7a c e1 H4 k2 0m e3 H3 k9 m e1 H3 k9 m H3 k9 ac e3 H3 k4 m e2 k4 m k4 m H3 Average of Fisher Transformed Spearman rank correlations for cancer cells Distance from exon -5kb -2kb -1kb 0kb +1kb +2kb +5kb H3 Protein coding Splicing e1 Table S2. Association between splicing exon inclusion rate (SEIR) of protein coding genes and histone modifications in cancer cell lines. We analyzed three cancer cell lines (Hepg2, Helas3, K562) for associations between splicing exon inclusion rate (SEIR) and histone modification enrichment for protein coding genes. Values represent the average of Fisher transformed Spearman rank correlations to enable direct comparison. Coefficients are color-coded, with red representing increasingly negative and green representing increasingly positive correlation. Distance from exon categories signifies a region relative to a given exon where histone enrichment was measured. 0kb represents region within given exon boundaries, and 1kb, 2kb, and 5kb signify regions from the exon boundary either upstream (negative) or downstream (positive). -0.1264 -0.1204 -0.0688 -0.1399 -0.159 -0.1433 -0.0498 -0.2367 -0.2231 -0.1773 -0.2532 -0.2615 -0.2579 -0.1907 -0.2353 -0.223 -0.1839 -0.2437 -0.2501 -0.2515 -0.1958 -0.1345 -0.1065 -0.0591 -0.1475 -0.1351 -0.112 -0.174 0.15088 0.16649 0.17715 0.13425 0.13922 0.14339 0.06073 -0.0553 -0.0666 -0.0887 -0.0555 -0.0551 -0.0692 -0.0396 0.07855 0.09038 0.09863 0.01208 0.00976 0.01514 0.00168 -0.1433 -0.1157 -0.0679 -0.1531 -0.1376 -0.1094 -0.1384 -0.2172 -0.2291 -0.243 -0.2317 -0.2432 -0.2588 -0.1557 0.33112 0.32191 0.3049 0.38082 0.36847 0.33757 0.28785 0.02958 0.05967 0.10801 -0.0347 -0.0212 0.00058 -0.0229 -0.2238 -0.2192 -0.1921 -0.225 -0.2353 -0.2356 -0.2164 -0.1341 -0.117 -0.0894 -0.1223 -0.1076 -0.0802 -0.1099 lincRNA TSSIR Average of Fisher Transformed Spearman rank correlations for cancer cells 0.18236 0.18136 0.15496 0.12096 0.19572 0.20292 0.20012 0.18413 0.19889 0.20463 0.20517 0.2186 0.21546 0.21656 0.18131 0.19575 0.20116 0.20588 0.20999 0.20723 0.20768 0.19036 0.20194 0.20348 0.20127 0.20923 0.20741 0.20573 0.03933 0.03041 -0.001 -0.0046 0.00853 0.0248 0.03642 -0.0621 0.0378 0.20156 -0.0472 0.01576 0.21458 -0.0427 -0.0178 0.21322 -0.0218 -0.0172 0.20345 -0.0474 -0.0354 0.21293 -0.0656 -0.0328 0.20882 -0.0629 -0.0132 0.20996 -0.0733 -0.0479 -0.0429 -0.0293 -0.0502 -0.0676 -0.0795 0.15204 0.0921 0.02366 0.01656 0.07038 0.11181 0.15723 0.17973 0.16921 0.15336 0.12869 0.17467 0.18386 0.19777 0.10571 0.14187 0.15004 0.14774 0.15045 0.13667 0.11818 cf Ct H2 az e2 e3 k7 9m H3 k3 6m e3 H3 k2 7m H3 H3 k2 7a c e1 k2 0m e3 H4 k9 m e1 H3 k9 m H3 k9 ac e3 H3 k4 m e2 H3 k4 m H3 H3 Distance from exon -5kb -2kb -1kb 0kb +1kb +2kb +5kb k4 m e1 Table S3. Association between transcription start site inclusion rate (TSSIR) of lincRNAs and histone modification enrichment in cancer cell lines. We analyzed three cancer cell lines (Hepg2, Helas3, K562) for associations between transcription start site inclusion rate (TSSIR) and histone modification enrichment for lincRNAs. Values represent the average of Fisher transformed Spearman rank correlations to enable direct comparison. Coefficients are color-coded, with red representing increasingly negative and green representing increasingly positive correlation. Distance from exon categories signify region relative to a given exon where histone enrichment was measured. 0kb represents region within given exon boundaries, and 1kb, 2kb, and 5kb signify regions from the exon boundary either upstream (negative) or downstream (positive). 0.1007 0.11348 0.1186 0.08401 0.08887 0.07812 0.07658 lincRNA Splicing Average of Fisher Transformed Spearman rank correlations for cancer cells 0.07083 0.05181 0.04643 0.04437 0.04117 0.05316 0.05464 0.04938 0.02422 0.01278 0.01107 0.01633 0.02978 0.02587 0.04611 0.02366 0.00786 0.00734 0.0091 0.01871 0.02135 0.0469 -0.0376 0.02734 -0.0216 0.01699 -0.0043 0.00202 -0.0154 0.01646 -0.0073 0.03418 0.00309 0.03263 -0.0077 -0.0029 0.00367 0.00968 0.00065 -0.0015 0.00943 -8E-08 -0.0234 -0.0289 -0.0146 -0.0057 -0.0134 -0.0066 -0.0058 0.06169 0.03241 0.0202 0.03141 0.02087 0.03846 0.04028 -0.0438 -0.0438 -0.0382 -0.0344 -0.0485 -0.0411 -0.0425 0.06375 0.04163 0.05165 0.06258 0.05638 0.07357 0.07824 Ct cf e2 az H2 H3 k7 9m e3 H3 k3 6m e3 k2 7m H3 H3 k2 7a c e1 k2 0m H4 e3 H3 k9 m e1 k9 m H3 k9 ac H3 e3 H3 k4 m e2 k4 m H3 H3 Distance from exon -5kb -2kb -1kb 0kb +1kb +2kb +5kb k4 m e1 Table S4. Association between splicing exon inclusion rate (SEIR) of lincRNAs and histone modifications in cancer cell lines. We analyzed three cancer cell lines (Hepg2, Helas3, K562) for associations between splicing exon inclusion rate (SEIR) and histone modification enrichment for protein coding genes. Values represent the average of Fisher transformed Spearman rank correlations to enable direct comparison. Coefficients are color-coded, with red representing increasingly negative and green representing increasingly positive correlation. Distance from exon categories signifies a region relative to a given exon where a histone enrichment was measured. 0kb represents region within given exon boundaries, and 1kb, 2kb, and 5kb signify regions from the exon boundary either upstream (negative) or downstream (positive). 0.05791 0.02593 0.01036 0.04083 0.00101 -0.0055 0.03466 -0.0096 -0.0103 0.0134 -0.0391 -0.0115 0.02072 -0.0205 -0.0056 0.04411 -0.0006 0.00556 0.0518 -0.0043 0.01127 Table S5. Association between histone modification enrichment and transcription start site inclusion rate. Correlations between transcription start site inclusion rate or splicing and enrichment of selected histone modifications in normal or cancer cell lines. The following correlations and cell line categories are: (A) transcription start site switching and normal cell lines, (B) splicing and normal cell lines, (C) transcription start site switching and cancer cell lines, and (D) splicing and cancer cell lines. Black dots represent median Spearman rank correlations between exon inclusion rate and given histone marks. All correlation coefficients were transformed using a Fisher’s transformation before plotting. Notches were calculated as æ IQR ö ±1.58´ ç ÷ where IQR stands for inter quartile range and n for sample size. Distances from è n ø exon represent genomic blocks of a given size from exon start (upstream regions) or exon end (downstream regions). A B C D Table S6. Top 20 ontology categories enriched among 840 candidate genes that showed a significant association between splicing exon inclusion rates and histone modification enrichment. GO term GO:0048583 GO:0051716 GO:0044763 GO:0009966 GO:0050794 GO:0065007 GO:0007165 GO:0044767 GO:0023051 GO:0032502 GO:0010646 GO:0050789 GO:0048522 GO:1902531 GO:0051128 GO:0019222 GO:0044699 GO:0006928 GO:0048518 GO:0009987 Description regulation of response to stimulus cellular response to stimulus single-organism cellular process regulation of signal transduction regulation of cellular process biological regulation signal transduction single-organism developmental process regulation of signaling developmental process regulation of cell communication regulation of biological process positive regulation of cellular process regulation of intracellular signal transduction regulation of cellular component organization regulation of metabolic process single-organism process cellular component movement positive regulation of biological process cellular process P-value 1.77E-10 2.53E-10 2.66E-10 5.79E-10 1.57E-09 3.93E-09 4.47E-09 4.93E-09 9.31E-09 1.13E-08 1.34E-08 1.34E-08 2.30E-08 4.11E-08 6.87E-08 7.50E-08 1.69E-07 2.00E-07 2.37E-07 2.57E-07 FDR q-value 2.11E-06 1.51E-06 1.05E-06 1.72E-06 3.74E-06 7.79E-06 7.60E-06 7.33E-06 1.23E-05 1.35E-05 1.45E-05 1.33E-05 2.10E-05 3.49E-05 5.45E-05 5.58E-05 1.19E-04 1.33E-04 1.49E-04 1.53E-04 Table S7. Leave-one-out cross validation summary statistics. Validation cell line Gm12878 H1hesc Hsmm Huvec Nhek Nhlf Helas3 Hepg2 K562 Mono Hmec Total Number Of Genes 29354 29354 29354 29354 29354 29354 29354 29354 29354 29354 29354 Number of Genes Assigned to Categories 2750 2745 2770 2696 2753 2786 2357 2720 2707 2519 2792 Number of Significant Genes 686 660 665 652 667 687 455 684 672 654 669 Number of Significant Cancer Genes 26 23 29 25 23 27 15 21 21 22 29 Module S1. Epigenetically aberrant regions in three cancer cell lines are enriched for oncogenes. Using the cancer gene consensus from COSMIC, we tested for oncogene enrichment in epigenetically aberrant regions of three cancer cell lines (Helas3, Hepg2, and K562) with regard to specific histone marks using the hypergeometric test. All p e2 H2 az 9m e3 H3 k7 6m e3 H3 k3 c 7a 7m H3 k2 e1 H3 k2 0m m e3 H4 k2 H3 k9 ac e3 H3 k9 m e2 H3 k4 m H3 k4 CellLine Helas3 0.06855 0.0916 0.0058 0.0231 0.0231 0.0231 0.0054 0.0684 0.8893 0.0003 3E-06 0.0054 Hepg2 0.00181 0.3192 0.61 0.0085 0.3566 NA 0.0003 0.3216 0.644 0.0018 0.0002 0.2707 K562 0.63255 0.0922 0.2228 0.0052 0.0052 0.9498 0.0922 0.0922 0.6325 0.0922 4E-05 0.6325 Ct Hypergeometric test p value H3 k4 cf m e1 values were corrected for multiple testing (FDR). Green highlights significantly enriched marks at the 5% FDR level.