Supplementary Information Gene expression-based classification of malignant gliomas correlates better with survival than histological classification Catherine L. Nutt, D. R. Mani, Rebecca A. Betensky, Pablo Tamayo, J. Gregory Cairncross, Christine Ladd, Ute Pohl, Christian Hartmann, Margaret E. McLaughlin, Tracy T. Batchelor, Peter M. Black, Andreas von Deimling, Scott L. Pomeroy, Todd R. Golub and David N. Louis Table of Contents: High Grade Glioma Dataset ............................................................................................................ 2 High Grade Glioma Class Markers ................................................................................................. 4 Features of the 20-feature k-NN Class Prediction Model ............................................................... 7 Features Used During Building of the Class Prediction Model ...................................................... 8 Summary of Training Sample Set Class Predictions .................................................................... 11 Summary of Test Sample Set Class Predictions ........................................................................... 12 Survival Statistics for the High Grade Glioma Dataset ................................................................ 13 Survival curves - all glioblastomas and anaplastic oligodendrogliomas ...................................... 15 High Grade Glioma Dataset Dataset: 50 high grade gliomas - 28/50 glioblastomas - 14/28 classic glioblastomas - 14/28 non-classic glioblastomas - 22/50 anaplastic oligodendrogliomas - 7/22 classic anaplastic oligodendrogliomas - 15/22 non-classic anaplastic oligodendrogliomas Sample Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Sample Name Brain_CG_1 Brain_CG_2 Brain_CG_3 Brain_CG_4 Brain_CG_5 Brain_CG_6 Brain_CG_7 Brain_CG_8 Brain_CG_9 Brain_CG_10 Brain_CG_11 Brain_CG_12 Brain_CG_13 Brain_CG_14 Sample Type Classic glioblastoma Classic glioblastoma Classic glioblastoma Classic glioblastoma Classic glioblastoma Classic glioblastoma Classic glioblastoma Classic glioblastoma Classic glioblastoma Classic glioblastoma Classic glioblastoma Classic glioblastoma Classic glioblastoma Classic glioblastoma 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Brain_NG_1 Brain_NG_2 Brain_NG_3 Brain_NG_4 Brain_NG_5 Brain_NG_6 Brain_NG_7 Brain_NG_8 Brain_NG_9 Brain_NG_10 Brain_NG_11 Brain_NG_12 Brain_NG_13 Brain_NG_14 Non-classic glioblastoma Non-classic glioblastoma Non-classic glioblastoma Non-classic glioblastoma Non-classic glioblastoma Non-classic glioblastoma Non-classic glioblastoma Non-classic glioblastoma Non-classic glioblastoma Non-classic glioblastoma Non-classic glioblastoma Non-classic glioblastoma Non-classic glioblastoma Non-classic glioblastoma 2 High Grade Glioma Dataset (continued) Sample Number 29 30 31 32 33 34 35 Sample Name Brain_CO_1 Brain_CO_2 Brain_CO_3 Brain_CO_4 Brain_CO_5 Brain_CO_6 Brain_CO_7 Sample Type Classic anaplastic oligodendroglioma Classic anaplastic oligodendroglioma Classic anaplastic oligodendroglioma Classic anaplastic oligodendroglioma Classic anaplastic oligodendroglioma Classic anaplastic oligodendroglioma Classic anaplastic oligodendroglioma 36 37 37 39 40 41 42 43 44 45 46 47 48 49 50 Brain_NO_1 Brain_NO_2 Brain_NO_3 Brain_NO_4 Brain_NO_5 Brain_NO_6 Brain_NO_7 Brain_NO_8 Brain_NO_9 Brain_NO_10 Brain_NO_11 Brain_NO_12 Brain_NO_13 Brain_NO_14 Brain_NO_15 Non-classic anaplastic oligodendroglioma Non-classic anaplastic oligodendroglioma Non-classic anaplastic oligodendroglioma Non-classic anaplastic oligodendroglioma Non-classic anaplastic oligodendroglioma Non-classic anaplastic oligodendroglioma Non-classic anaplastic oligodendroglioma Non-classic anaplastic oligodendroglioma Non-classic anaplastic oligodendroglioma Non-classic anaplastic oligodendroglioma Non-classic anaplastic oligodendroglioma Non-classic anaplastic oligodendroglioma Non-classic anaplastic oligodendroglioma Non-classic anaplastic oligodendroglioma Non-classic anaplastic oligodendroglioma 3 High Grade Glioma Class Markers The table below demonstrates the top 50 marker genes for each tumor class including the permutation test values. Genes were selected based on the signal-to-noise metric. Variation filter: max/min > 3 (3-fold), max-min > 100 absolute units. GBM, glioblastoma; AO, anaplastic oligodendroglioma. Permutation Test Marker Genes Distinction GBM GBM GBM GBM Distance 1.3750 1.1982 1.1633 1.0315 Perm 1% 1.4928 1.3190 1.2929 1.2698 Perm 5% 1.3180 1.1916 1.1591 1.1338 Perm 10% 1.2629 1.1067 1.0328 1.0064 Feature 34091_s_at 630_at 631_g_at 39691_at GBM GBM 0.9587 0.9581 1.2638 1.2557 1.0978 1.0540 0.9705 0.9446 160039_at 35016_at GBM 0.9041 1.2353 1.0474 0.9378 38791_at GBM GBM GBM GBM GBM GBM GBM GBM 0.9021 0.8941 0.8838 0.8798 0.8716 0.8617 0.8524 0.8523 1.2041 1.1966 1.1726 1.1645 1.1414 1.1310 1.1295 1.1208 1.0121 0.9997 0.9956 0.9917 0.9758 0.9428 0.9353 0.9303 0.9331 0.9047 0.8854 0.8842 0.8803 0.8780 0.8489 0.8415 1395_at 37542_at 935_at 34768_at 32749_s_at 36678_at 40793_s_at 37421_f_at GBM GBM 0.8492 0.8309 1.1185 1.0880 0.9088 0.9006 0.8316 0.8216 1318_at 37012_at GBM 0.8237 1.0874 0.8904 0.8212 388_at GBM GBM 0.8128 0.8127 1.0801 1.0709 0.8858 0.8852 0.8194 0.8165 41624_r_at 34193_at GBM GBM GBM GBM GBM 0.8096 0.7946 0.7882 0.7871 0.7857 1.0586 1.0563 1.0533 1.0390 1.0325 0.8835 0.8834 0.8722 0.8692 0.8631 0.8112 0.8098 0.8083 0.7893 0.7843 40807_at 31444_s_at 1860_at 36150_at 40771_at GBM 0.7828 1.0272 0.8607 0.7797 31342_at GALNT2: UDP-N-acetyl-alpha-D galactosamine:polypeptide N acetylgalactosaminyltransferase 2 (GalNAc-T2) GBM GBM GBM 0.7820 0.7762 0.7691 1.0209 1.0189 1.0137 0.8536 0.8527 0.8508 0.7783 0.7781 0.7766 39122_at 34822_at 36921_at GPI: glucose phosphate isomerase GBM GBM GBM GBM GBM 0.7524 0.7470 0.7452 0.7383 0.7328 1.0128 1.0051 1.0037 0.9999 0.9931 0.8438 0.8434 0.8432 0.8421 0.8390 0.7756 0.7741 0.7735 0.7731 0.7628 406_at 36138_at 41485_at 39694_at 36131_at GBM GBM GBM 0.7263 0.7257 0.7207 0.9888 0.9825 0.9786 0.8387 0.8299 0.8262 0.7599 0.7598 0.7587 769_s_at 33891_at 41549_s_at GBM 0.7152 0.9775 0.8239 0.7547 37759_at GBM 0.7150 0.9751 0.8225 0.7539 AFFX-HUMISGF3A/ M97935_MA_at 4 Description VIM: vimentin DCTD: dCMP deaminase DCTD: dCMP deaminase SH3GLB1: SH3-domain GRB2-like endophilin B1 MAPK4: mitogen-activated protein kinase 4 CD74: CD74 antigen (invariant polypeptide of major histocompatibility complex, class II antigen-associated) DDOST: dolichyl-diphosphooligosaccharide protein glycosyltransferase ARHC: ras homolog gene family, member C LHFPL2: lipoma HMGIC fusion partner-like 2 CAP: adenylyl cyclase-associated protein TXNDC: thioredoxin domain-containing DKFZp586K1720 protein TAGLN2: transgelin 2 AQP4: aquaporin 4 Human DNA sequence from clone RP3-377H14 on chromosome 6p21.32-22.1 RBBP4: retinoblastoma-binding protein 4 CAPZB: capping protein (actin filament) muscle Z-line, beta PIK3R2: phosphoinositide-3-kinase, regulatory subunit, polypeptide 2 (p85 beta) FZR1: Fzr1 protein CHL1: cell adhesion molecule with homology to L1CAM (close homologue of L1) MUF1: MUF1 protein ANXA2P3: annexin A2 pseudogene 3 TP53BP2: tumor protein p53-binding protein, 2 KIAA0842 protein Human DNA sequence from clone 376D21 on chromosome Xq11.1-12 TP53BP2: tumor protein p53-binding protein, 2 TCTE1L: t-complex-associated-testis expressed 1-like ITGB4: integrin, beta 4 CAPNS1: calpain, small subunit 1 LDHA: lactate dehydrogenase A Hypothetical protein MGC5508 Homo sapiens genes encoding RNCC protein, DDAH protein, Ly6-C protein, Ly6-D protein and immunoglobulin receptor ANXA2: annexin A2 DKFZp564H182 protein AP1S2: adaptor-related protein complex 1, sigma 2 subunit LAPTM5: lysosomal-associated multispanning membrane protein-5 STAT1: signal transducer and activator of transcription 1, 91kD High Grade Glioma Class Markers (continued) Permutation Test Marker Genes Distinction GBM Distance 0.7121 Perm 1% 0.9636 Perm 5% 0.8217 Perm 10% 0.7531 Feature 38650_at GBM GBM GBM 0.7112 0.7076 0.7058 0.9600 0.9569 0.9542 0.8166 0.8142 0.8102 0.7501 0.7499 0.7467 36950_at 40817_at 38253_at GBM GBM GBM GBM GBM GBM GBM 0.7022 0.7018 0.6975 0.6937 0.6930 0.6910 0.6867 0.9520 0.9365 0.9359 0.9251 0.9240 0.9228 0.9210 0.8067 0.8051 0.8048 0.7995 0.7991 0.7899 0.7897 0.7428 0.7307 0.7290 0.7235 0.7219 0.7216 0.7204 38812_at 34224_at 39376_at 37714_at 37628_at 1649_at 38760_f_at LAMB2: laminin, beta 2 (laminin S) FADS3: fatty acid desaturase 3 KIAA0630 protein GAP43: growth associated protein 43 MAOB: monoamine oxidase B Human putative cyclin G1 interacting protein AO AO AO AO AO AO 1.8499 1.6403 1.4822 1.4652 1.4567 1.4044 1.6556 1.2785 1.2658 1.2568 1.2146 1.1938 1.3782 1.2232 1.1734 1.1395 1.1241 1.0952 1.2758 1.1896 1.1376 1.0980 1.0762 1.0535 33619_at 34679_at 37573_at 33677_at 326_i_at 41325_at AO 1.4022 1.1925 1.0676 1.0309 38681_at AO 1.3203 1.1910 1.0460 0.9988 41792_at AO AO 1.3163 1.2909 1.1745 1.1718 1.0286 1.0260 0.9905 0.9804 37249_at 37953_s_at AO AO AO AO AO 1.2866 1.2755 1.2648 1.2501 1.2405 1.1641 1.1622 1.1595 1.1584 1.1535 0.9905 0.9871 0.9773 0.9735 0.9676 0.9720 0.9388 0.9360 0.9222 0.9109 35125_at 40235_at 41016_at 40840_at 34531_at RPS13: ribosomal protein S13 BCR: breakpoint cluster region ANGPTL2: angiopoietin-like 2 RPL24: ribosomal protein L24 RPS20: Ribosomal protein S20 KCNK3: potassium channel, subfamily K, member 3 (TASK-1) EIF3S6: eukaryotic translation initiation factor 3, subunit 6 (48kD) ABCC8: ATP-binding cassette, sub-family C (CFTR/MRP), member 8 PDE8B: phosphodiesterase 8B ACCN2: amiloride-sensitive cation channel 2, neuronal RPS6: Ribosomal protein S6 ACK1: activated p21cdc42Hs kinase KIAA0510 protein AO AO AO 1.2402 1.2377 1.2341 1.1335 1.1315 1.1073 0.9614 0.9448 0.9343 0.9060 0.9014 0.8900 37578_at 1134_at 41749_at AO 1.2237 1.1071 0.9339 0.8848 38340_at AO AO 1.2166 1.1963 1.0978 1.0825 0.9261 0.9002 0.8840 0.8708 36196_at 39427_at AO AO 1.1878 1.1741 1.0824 1.0807 0.8999 0.8949 0.8691 0.8678 32341_f_at 36164_at AO AO 1.1702 1.1691 1.0749 1.0660 0.8908 0.8810 0.8666 0.8590 39856_at 36617_at AO AO AO 1.1661 1.1607 1.1570 1.0413 1.0398 1.0388 0.8809 0.8794 0.8718 0.8532 0.8489 0.8461 41250_at 32436_at 39572_at AO AO AO 1.1488 1.1401 1.1154 1.0378 1.0196 1.0150 0.8656 0.8579 0.8528 0.8422 0.8375 0.8374 35852_at 36358_at 36027_at AO AO AO AO 1.1128 1.1057 1.1035 1.1016 1.0137 1.0121 1.0097 1.0085 0.8467 0.8441 0.8439 0.8367 0.8284 0.8273 0.8253 0.8222 39864_at 34184_at 32791_at 36618_g_at AO 1.0957 1.0056 0.8361 0.8124 33485_at 5 Description IGFBP5: insulin-like growth factor binding protein 5 HSGP25L2G: gp25L2 protein NUCB1: nucleobindin 1 AGL: amylo-1,6-glucosidase, 4-alpha glucanotransferase (glycogen debranching enzyme, glycogen storage disease type III) BTN3A2: butyrophilin, subfamily 3, member A2 PPIF: peptidylprolyl isomerase F (cyclophilin F) FLRT1: fibronectin leucine rich transmembrane protein 1 Homo sapiens clone-RES4-4 ACK1: activated p21cdc42Hs kinase C21orf33: chromosome 21 open reading frame 33 KIAA0655 protein: huntingtin interacting protein-1-related PFKM: phosphofructokinase, muscle UQCRB: ubiquinol-cytochrome c reductase binding protein RPL23A: ribosomal protein L23a PDX1: pyruvate dehydrogenase complex, lipoyl containing component X; E3-binding protein RPL36A: ribosomal protein L36a ID1: inhibitor of DNA binding 1, dominant negative helix-loop-helix protein JYV1: JTV1 gene RPL27A: ribosomal protein L27a GRIK2: glutamate receptor, ionotropic, kainate 2 CRY2: cryptochrome 2 (photolyase-like) RPL9: ribosomal protein L9 POLR2F: polymerase (RNA) II (DNA directed) polypeptide F CIRBP: cold inducible RNA-binding protein APCL: adenomatous polyposis coli like MAC30: hypothetical protein ID1: inhibitor of DNA binding 1, dominant negative helix-loop-helix protein RPL4: ribosomal protein L4 High Grade Glioma Class Markers (continued) Permutation Test Marker Genes Distinction AO Distance 1.0949 Perm 1% 1.0028 Perm 5% 0.8355 Perm 10% 0.8099 Feature 32576_at AO AO AO 1.0877 1.0870 1.0854 1.0024 1.0013 0.9997 0.8306 0.8268 0.8222 0.8093 0.8040 0.8038 537_f_at 327_f_at 34345_at AO AO AO AO AO 1.0740 1.0713 1.0620 1.0609 1.0593 0.9934 0.9926 0.9919 0.9833 0.9778 0.8172 0.8150 0.8101 0.8047 0.8034 0.8035 0.7994 0.7979 0.7948 0.7752 31708_at 41264_at 41269_r_at 35848_at 841_at AO 1.0567 0.9745 0.8012 0.7726 35633_at AO AO AO AO 1.0413 1.0364 1.0355 1.0346 0.9717 0.9648 0.9579 0.9550 0.8011 0.7998 0.7992 0.7952 0.7627 0.7624 0.7584 0.7558 32487_s_at 41289_at 37697_s_at 35326_at 6 Description EIF3S5: eukaryotic translation initiation factor 3, subunit 5 (epsilon, 47kD) Human breakpoint cluster region (BCR) gene RPS20: Ribosomal protein S20 TOM: putative mitochondrial outer membrane protein import receptor RPL30: ribosomal protein L30 DKFZp586F1322 protein API5L1: API5-like 1 DKFZp586J231 protein OLIG2: oligodendrocyte lineage transcription factor 2 ELMO1: engulfment and cell motility 1 (ced 12 homolog, C. elegans) KPNA4: karyopherin alpha 4 (importin alpha 3) NCAM1: neural cell adhesion molecule 1 Homo sapiens porin (por) mRNA 54TM: putative transmembrane protein; homolog of yeast Golgi membrane protein Yif1p (Yip1p-interacting factor) Features of the 20-feature k-NN Class Prediction Model The table below demonstrates feature numbers and gene identifications of the 20-feature k-NN class prediction model. Class Correlation GBM GBM GBM GBM Feature Number 34091_s_at 630_at 631_g_at 39691_at Accession Number Z19554 L39874 L39874 AB007960 GBM GBM 160039_at 35016_at NM_002747 M13560 GBM 38791_at D29643 GBM GBM GBM AO AO AO AO AO AO 1395_at 37542_at 935_at 33619_at 34679_at 37573_at 33677_at 326_i_at 41325_at L25081 D86961 L12168 L01124 X02596 AF007150 M94314 HG1800-HT1823 AF006823 AO 38681_at U62962 AO 41792_at L78207 AO AO 37249_at 37953_s_at AF079529 U78181 Gene Description VIM: vimentin DCTD: dCMP deaminase DCTD: dCMP deaminase SH3GLB1: SH3-domain GRB2-like endophilin B1 MAPK4: mitogen-activated protein kinase 4 CD74: CD74 antigen (invariant polypeptide of major histocompatibility complex, class II antigen-associated) DDOST: dolichyl-diphosphooligosaccharide protein glycosyltransferase ARHC: ras homolog gene family, member C LHFPL2: lipoma HMGIC fusion partner-like 2 CAP: adenylyl cyclase-associated protein RPS13: ribosomal protein S13 BCR: breakpoint cluster region ANGPTL2: angiopoietin-like 2 RPL24: ribosomal protein L24 RPS20: Ribosomal Protein S20 KCNK3: potassium channel, subfamily K, member 3 (TASK-1) EIF3S6: eukaryotic translation initiation factor 3, subunit 6 (48kD) ABCC8: ATP-binding cassette, sub-family C (CFTR/MRP), member 8 PDE8B: phosphodiesterase 8B ACCN2: amiloride-sensitive cation channel 2, neuronal 7 Features Used During Building of the Class Prediction Model Features The figure below demonstrates all features used to construct the 20-feature k-NN class prediction model during leave-one-out cross validation and the frequency of their use. The gene identifications of the feature numbers are given on the next two pages. 34091_s_at 630_at 33619_at 34679_at 631_g_at 37573_at 35016_at 38681_at 33677_at 160039_at 326_i_at 41325_at 39691_at 1395_at 38791_at 37542_at 37249_at 36678_at 1318_at 37953_s_at 37578_at 41792_at 41749_at 35125_at 34531_at 41016_at 34768_at 36358_at 935_at 38650_at 32749_s_at 40793_s_at 40235_at 388_at 32576_at 36617_at 38340_at 34193_at 37012_at 1134_at 32487_s_at 39338_at 39427_at 31481_s_at 36027_at 37421_f _at 39112_at 41624_r_at 32852_at 1860_at 36150_at 39694_at 38338_at 1704_at 32297_s_at 35852_at 537_f _at 41250_at 39572_at 40807_at 41551_at 41485_at 841_at 38545_at 39522_at 32791_at 406_at 33485_at 36927_at 32436_at 38391_at 31342_at 40840_at 37680_at 39856_at 36196_at 32819_at 0 25 50 75 Fractional Feature Use (%) 8 100 Features Used During Building of the Class Prediction Model (continued) Feature Number 631_g_at 34679_at 33619_at 630_at 34091_s_at 35016_at 37573_at 160039_at 33677_at 38681_at 41325_at 326_i_at 39691_at 1395_at 38791_at 37249_at 37542_at 37953_s_at 1318_at 36678_at 41792_at 37578_at 35125_at 41749_at 38650_at 935_at 36358_at 34768_at 41016_at 34531_at 32749_s_at 388_at 40235_at 40793_s_at 37012_at 34193_at 38340_at 36617_at 32576_at Gene Description DCTD: dCMP deaminase BCR: breakpoint cluster region RPS13: ribosomal protein S13 DCTD: dCMP deaminase VIM: vimentin CD74: CD74 antigen (invariant polypeptide of major histocompatibility complex, class II antigen-associated) ANGPTL2: angiopoietin-like 2 MAPK4: mitogen-activated protein kinase 4 RPL24: ribosomal protein L24 EIF3S6: eukaryotic translation initiation factor 3, subunit 6 (48kD) KCNK3: potassium channel, subfamily K, member 3 (TASK-1) RPS20: Ribosomal protein S20 SH3GLB1: SH3-domain GRB2-like endophilin B1 ARHC: ras homolog gene family, member C DDOST: dolichyl-diphosphooligosaccharide protein glycosyltransferase PDE8B: phosphodiesterase 8B LHFPL2: lipoma HMGIC fusion partner-like 2 ACCN2: amiloride-sensitive cation channel 2, neuronal RBBP4: retinoblastoma-binding protein 4 TAGLN2: transgelin 2 ABCC8: ATP-binding cassette, sub-family C (CFTR/MRP), member 8 Homo sapiens clone-RES4-4 RPS6: Ribosomal protein S6 C21orf33: chromosome 21 open reading frame 33 IGFBP5: insulin-like growth factor binding CAP: adenylyl cyclase-associated protein RPL9: ribosomal protein L9 TXNDC: thioredoxin domain-containing KIAA0510 protein FLRT1: fibronectin leucine rich transmembrane protein 1 DKFZp586K1720 protein PIK3R2: phosphoinositide-3-kinase, regulatory subunit, polypeptide 2 (p85 beta) ACK1: activated p21cdc42Hs kinase AQP4: aquaporin 4 CAPZB: capping protein (actin filament) muscle Z-line, beta CHL1: cell adhesion molecule with homology to L1CAM (close homologue of L1) KIAA0655 protein: huntingtin interacting protein-1-related ID1: inhibitor of DNA binding 1, dominant negative helix-loop-helix protein EIF3S5: eukaryotic translation initiation factor 3, subunit 5 (epsilon, 47kD) 9 Features Used During Building of the Class Prediction Model (continued) Feature Number 32819_at 36196_at 39856_at 37680_at 40840_at 31342_at 38391_at 32436_at 36927_at 33485_at 406_at 32791_at 39522_at 38545_at 841_at 41485_at 41551_at 40807_at 39572_at 41250_at 537_f_at 35852_at 32297_s_at 1704_at 38338_at 39694_at 36150_at 1860_at 32852_at 41624_r_at 39112_at 37421_f_at 36027_at 31481_s_at 39427_at 39338_at 32487_s_at 1134_at Gene Description H2BFA: H2B histone family, member A PFKM: phosphofructokinase, muscle RPL36A: ribosomal protein L36a AKAP12: A kinase (PRKA) anchor protein (gravin) 12 PPIF: peptidylprolyl isomerase F (cyclophilin F) GALNT2: UDP-N-acetyl-alpha-D galactosamine:polypeptide N acetylgalactosaminyltransferase 2 (GalNAc-T2) CAPG: capping protein (actin filament), gelsolin-like RPL27A: ribosomal protein L27a GS3686: hypothetical protein, expressed in osteoblast RPL4: ribosomal protein L4 ITGB4: integrin, beta 4 MAC30: hypothetical protein PFKFB3: 6-phosphofructo-2-kinase/fructose-2,6-biphosphatase 3 INHBB: inhibin, beta B (activin AB beta polypeptide) OLIG2: oligodendrocyte lineage transcription factor 2 LDHA: lactate dehydrogenase A RER1: similar to S. cerevisiae RER1 MUF1: MUF1 protein GRIK2: glutamate receptor, ionotropic, kainate 2 JYV1: JTV1 gene Human breakpoint cluster region (BCR) gene CRY2: cryptochrome 2 (photolyase-like) KLRC2: killer cell lectin-like receptor subfamily C, member 2 VAV2: vav 2 oncogene RRAS: related RAS viral (r-ras) oncogene homolog Hypothetical protein MGC5508 KIAA0842 protein TP53BP2: tumor protein p53-binding protein, 2 TXN2: thioredoxin, mitochondrial FZR1: Fzr1 protein USF2: upstream transcription factor 2, c-fos interacting Human DNA sequence from clone RP3-377H14 on chromosome 6p21.32-22.1 POLR2F: polymerase (RNA) II (DNA directed) polypeptide F TMSB10: thymosin, beta 10 UQCRB: ubiquinol-cytochrome c reductase binding protein S100A10: S100 calcium-binding protein A10 (annexin II ligand, calpactin I, light polypeptide (p11)) KPNA4: karyopherin alpha 4 (importin alpha 3) ACK1: activated p21cdc42Hs kinase 10 Summary of Training Sample Set Class Predictions The table below summarizes the class predictions of the training sample set. This set includes the 21 classic high grade gliomas. The “call” is the classification given by the 20-feature k-NN model during leave-one-out cross validation and appears along with the confidence value. “Errors” are those tumors whose classification differed from the pathological classification. GBM, glioblastoma; AO, anaplastic oligodendroglioma. Sample Name GBM Calls Brain_CG_8 Brain_CG_11 Brain_CG_3 Brain_CG_4 Brain_CG14 Brain_CG_2 Brain_CO_4 Brain_CG_1 Brain_CG_9 Brain_CG_6 Brain_CG_13 Brain_CG_12 Brain_CG_7 AO Calls Brain_CO_5 Brain_CO_1 Brain_CO_6 Brain_CO_2 Brain_CO_7 Brain_CG_5 Brain_CO_3 Brain_CG_10 Call Confidence Pathology “Error” GBM GBM GBM GBM GBM GBM GBM GBM GBM GBM GBM GBM GBM 0.677 0.610 0.558 0.524 0.455 0.445 0.224 0.182 0.158 0.101 0.008 0.006 0.000 GBM GBM GBM GBM GBM GBM AO GBM GBM GBM GBM GBM GBM * AO AO AO AO AO AO AO AO 0.377 0.234 0.166 0.143 0.141 0.028 0.023 0.021 AO AO AO AO AO GBM AO GBM 11 * * Summary of Test Sample Set Class Predictions The table below summarizes the class predictions of the test sample set. This test sample set includes the 29 remaining high grade gliomas that were not used in the training set. The “call” is the classification given by the 20-feature k-NN model and appears along with the confidence value. “Errors” are those tumors whose classification differed from the pathological classification. GBM, glioblastoma; AO, anaplastic oligodendroglioma. Sample Name GBM Calls Brain_NO_8 Brain_NG_10 Brain_NO_6 Brain_NO_7 Brain_NG_14 Brain_NO_14 Brain_NG_3 Brain_NG_12 Brain_NG_9 Brain_NG_6 Brain_NG_5 Brain_NG_7 Brain_NO_4 Brain_NG_8 Brain_NG_4 Brain_NO_10 Brain_NO_3 Brain_NO_5 Brain_NG_11 Brain_NG_13 Brain_NO_13 Brain_NO_12 AO Calls Brain_NO_2 Brain_NO_11 Brain_NO_9 Brain_NO_15 Brain_NG_2 Brain_NO_1 Brain_NG_1 Call Confidence Pathology “Error” GBM GBM GBM GBM GBM GBM GBM GBM GBM GBM GBM GBM GBM GBM GBM GBM GBM GBM GBM GBM GBM GBM 0.6998 0.6852 0.5952 0.5889 0.5635 0.5497 0.4755 0.4352 0.4086 0.3103 0.3027 0.3024 0.3002 0.2978 0.2770 0.2261 0.1414 0.0875 0.0718 0.0536 0.0204 0.0000 AO GBM AO AO GBM AO GBM GBM GBM GBM GBM GBM AO GBM GBM AO AO AO GBM GBM AO AO * AO AO AO AO AO AO AO 0.5160 0.4445 0.3126 0.2984 0.2578 0.1871 0.0108 AO AO AO AO GBM AO GBM 12 * * * * * * * * * * * Survival Statistics for the High Grade Glioma Dataset The table below summarizes survival statistics for the entire high grade glioma dataset. Survival from date of intial diagnosis is given for all patients. For living patients, survival is given to time of last follow-up. Sample Name Brain_CG_1 Brain_CG_2 Brain_CG_3 Brain_CG_4 Brain_CG_5 Brain_CG_6 Brain_CG_7 Brain_CG_8 Brain_CG_9 Brain_CG_10 Brain_CG_11 Brain_CG_12 Brain_CG_13 Brain_CG_14 Vital Status Dead Dead Dead Dead Alive Dead Alive Dead Dead Dead Dead Dead Dead Dead Survival from Date of Initial Diagnosis (Days) 308 281 501 670 729 21 630 263 219 408 242 323 213 97 Brain_NG_1 Brain_NG_2 Brain_NG_3 Brain_NG_4 Brain_NG_5 Brain_NG_6 Brain_NG_7 Brain_NG_8 Brain_NG_9 Brain_NG_10 Brain_NG_11 Brain_NG_12 Brain_NG_13 Brain_NG_14 Dead Alive Dead Dead Dead Dead Alive Dead Alive Dead Dead Dead Dead Dead 1375 1644 406 308 177 103 992 41 1354 276 519 368 157 1162 13 Survival Statistics for the High Grade Glioma Dataset (continued) Sample Name Brain_CO_1 Brain_CO_2 Brain_CO_3 Brain_CO_4 Brain_CO_5 Brain_CO_6 Brain_CO_7 Vital Status Alive Alive Alive Dead Alive Alive Dead Survival from Date of Initial Diagnosis (Days) 231 1674 1604 215 359 171 272 Brain_NO_1 Brain_NO_2 Brain_NO_3 Brain_NO_4 Brain_NO_5 Brain_NO_6 Brain_NO_7 Brain_NO_8 Brain_NO_9 Brain_NO_10 Brain_NO_11 Brain_NO_12 Brain_NO_13 Brain_NO_14 Brain_NO_15 Dead Alive Alive Dead Dead Dead Dead Alive Alive Dead Alive Dead Dead Alive Alive 63 585 1804 916 793 803 559 1137 1100 498 795 790 789 439 638 14 Survival curves - all glioblastomas and anaplastic oligodendrogliomas Percent Survival The figure below demonstrates a significant difference between survival curves of all patients with glioblastomas and anaplastic oligodendrogliomas (p=0.009). Survival curves were plotted according to classifications based on traditional pathology. GBM, glioblastoma; AO, anaplastic oligodendroglioma. AO GBM 100 50 0 0 10 20 30 40 Time (Months) 15 50 60 70