Supplementary information for manuscript entitled " PROGgeneV2: Enhancements on the existing database." Chirayu Pankaj Goswami and Harikrishna Nakshatri Supplementary Table 1: Datasets introduced in PROGgeneV2. For information about the datasets, please follow GSE ID's on GEO, or refer to specific publications on TCGA database. TISSUE DATASET ADRENAL GSE19776 ADRENAL BLADDER GSE33371 BLCA_TCGA BLADDER GSE13507 BLADDER GSE19915 BLADDER GSE31684 BLADDER BLADDER GSE48276 TCGA_BLCA BONE GSE21257 BRAIN GSE13041_U133 BRAIN GSE13041_U95v2 BRAIN GSE16011 BRAIN GSE16581 DATASET DESCRIPTION GSE19776 - Adrenocortical Carcinoma Gene Expression Profiling GSE33371 - Beta-catenin status effects in human adrenocortical carcinomas (33), adenomas (22), and normal adrenal cortex (10) TCGA BLADDER CARCINOMA DATA GSE13507 - Predictive Value of Prognosis-Related Gene Expression Study in Primary Bladder Cancer GSE19915 - Subtype classification, grading, and outcome prediction of urothelial carcinomas by combined mRNA profiling and aCGH GSE31684 - Combination of a novel gene expression signature with a clinical nomogram improves the prediction of survival in high-risk bladder cancer GSE48276 - Gene expression profiling of urothelial carcinoma TCGA DATA BLADDER CANCER GSE21257 - Genome-wide gene expression profiling on prechemotherapy biopsies of osteosarcoma patients GSE13041 - Gene expression analysis of glioblastomas identifies the major molecular basis for the prognostic benefit of younger age GSE13041 - Gene expression analysis of glioblastomas identifies the major molecular basis for the prognostic benefit of younger age. AffyU95 Samples GSE16011 - Intrinsic Gene Expression Profiles of Gliomas are a Better Predictor of Survival than Histology GSE16581 - Genomic landscape of meningiomas: gene expression NO OF SAMPLES NO OF GENES PROFILED 22 21933 23 123 21212 15448 165 24357 144 2506 93 21702 73 62 20717 20502 53 24996 191 13480 49 9383 262 21212 67 21703 BRAIN GSE2817 BRAIN GSE30074 BRAIN GSE37418 BRAIN GSE42669 BRAIN GSE4271_U133A BRAIN BRAIN BRAIN GSE4271_U133B GSE4412_U133A GSE4412_U133B BRAIN GSE7696 BRAIN BRAIN TCGA_GBM TCGA_LGG BREAST GSE11121 BREAST GSE12093 BREAST GSE1379 BREAST GSE1456_U133A BREAST GSE1456_U133B GSE2817 - Wavelet modelling of microarray data provides chromosomal pattern of expression which predicts survival in gliomas GSE30074 - Expression data from 30 medulloblastomas GSE37418 - Novel mutations target distinct subgroups of medulloblastoma. GSE42669 - Patient specific orthotopic glioblastoma xenograft models recapitulate the histopathology and biology of human glioblastomas in situ (gene expression) Molecular subclasses of high-grade glioma: prognosis, disease progression, and neurogenesis Molecular subclasses of high-grade glioma: prognosis, disease progression, and neurogenesis freij-affy-human-91666 freij-affy-human-91666 GSE7696 - Glioblastoma from a homogenous cohort of patients treated within clinical trial TCGA DATA GLIOBLASTOMA MULTIFORME TCGA LOWER GRADE GLIOMA DATA GSE11121 - The humoral immune system has a key prognostic impact in node-negative breast cancer GSE12093 - The 76-gene Signature Defines High-Risk Patients that Benefit from Adjuvant Tamoxifen Therapy GSE1379 - breast cancer / tamoxifen monotherapy (whole tissue tumor biopsies) GSE1456_U133A - Gene expression of breast cancer tissue in a large population-based cohort of Swedish patients GSE1456_U133B - Gene expression of breast cancer tissue in a large population-based cohort of Swedish patients 25 21703 30 21103 75 21703 55 21092 77 13720 78 83 83 10688 13720 10688 77 21212 577 206 17813 16467 200 13480 132 13476 60 15369 159 13480 159 9856 BREAST GSE17705 BREAST GSE19615 BREAST GSE2034 BREAST GSE2603 BREAST GSE2990 BREAST GSE3494_U133A BREAST GSE3494_U133B BREAST GSE37751 BREAST GSE42568 BREAST GSE48408 BREAST GSE4922_U133A BREAST GSE4922_U133B BREAST GSE5327 BREAST GSE6532_U133_P2 GSE17705 - Endocrine Sensitivity Index Validation Dataset GSE19615 - Integrated genomic and function characterization of the 8q22 gain GSE2034 - Breast cancer relapse free survival GSE2603 - Subpopulations of MDAMB-231 and Primary Breast Cancers GSE2990 - Gene Expression Profiling in Breast Cancer: Understanding the Molecular Basis of Histologic Grade To Improve Prognosis GSE3494_U133A - An expression signature for p53 in breast cancer predicts mutation status, transcriptional effects, and patient survival GSE3494_U133B - An expression signature for p53 in breast cancer predicts mutation status, transcriptional effects, and patient survival GSE37751 - Molecular Profiles of Human Breast Cancer and Their Association with Tumor Subtypes and Disease Prognosis (Affymetrix) GSE42568 - Breast Cancer Gene Expression Analysis GSE48408 - Long non-coding RNA HOTAIR is an independent prognostic marker of metastasis in estrogen receptor positive primary breast cancer GSE4922_U133A - Genetic Reclassification of Histologic Grade Delineates New Clinical Subtypes of Breast Cancer GSE4922_U133B - Genetic Reclassification of Histologic Grade Delineates New Clinical Subtypes of Breast Cancer GSE5327 - Breast cancer relapse free survival and lung metastasis free survival GSE6532_U133_P2 - Definition of clinically distinct molecular subtypes 298 13480 115 21212 286 13480 82 13480 101 13480 236 13480 236 9856 60 21093 104 21703 164 21793 249 13480 248 9856 58 13480 87 21212 BREAST GSE6532_U133A BREAST GSE6532_U133B BREAST GSE7390 BREAST GSE9195 BREAST BREAST NKI TCGA CERVICAL GSE44001 COLON GSE12945 COLON GSE14333 COLON GSE16125 COLON GSE17536 COLON GSE17537 COLON GSE24551 COLON GSE28722 COLON COLON GSE28814 GSE29621 in estrogen receptor positive breast carcinomas using genomic grade GSE6532_U133A - Definition of clinically distinct molecular subtypes in estrogen receptor positive breast carcinomas using genomic grade GSE6532_U133B - Definition of clinically distinct molecular subtypes in estrogen receptor positive breast carcinomas using genomic grade GSE7390 - Strong Time Dependence of the 76-Gene Prognostic Signature GSE9195 - Predicting prognosis using molecular profiling in estrogen receptor-positive breast cancer treated with tamoxifen NKI - A gene-expression signature as a predictor of survival in breast cancer TCGA - TCGA BREAST CANCER DATA GSE44001 - Genetic profiling to predict recurrence of early cervical cancer GSE12945 - Expression data from colorectal cancers GSE14333 - Expression data from 290 primary colorectal cancers GSE16125 - Integrative approach for prioritizing cancer genes in sporadic colon cancer GSE17536 - Metastasis Gene Expression Profile Predicts Recurrence and Death in Colon Cancer Patients GSE17537- Metastasis Gene Expression Profile Predicts Recurrence and Death in Colon Cancer Patients GSE24551 - Exon level expression profiling of colorectal cancer tissue samples GSE28722 - EMT is the dominant program in human colon cancer (Agilent) GSE28814 - EMT is the dominant program in human colon cancer GSE29621 - mRNA and microRNA 189 13480 126 9856 198 13480 77 21212 295 596 11475 17813 300 20669 62 13720 187 21703 32 14984 174 21212 52 21212 160 14984 125 15218 122 65 15240 21703 COLON GSE30378 COLON GSE31595 COLON GSE39582 COLON GSE41258 COLON TCGA_COAD ESOPHAGUS GSE19417 EYE GSE22138 EYE GSE39717 HEME GSE10846 HEME GSE12417_U133A HEME GSE12417_U133B HEME GSE12417_U133P2 HEME GSE16131_U133A HEME GSE16131_U133B HEME GSE22762_U133A HEME GSE22762_U133B profile in colon cancer [mRNA data] GSE30378 - Gene level expression profiling of colorectal cancer tissue samples (test sample series) GSE31595 - Gene Expression Profiles in Stage II and III Colon Cancer. Application of a 128-gene signature GSE39582 - Gene expression Classification of Colon Cancer defines six molecular subtypes with distinct clinical, molecular and survival characteristics [Expression] GSE41258 - Expression data from colorectal cancer patients TCGA DATA COLON ADENOCARCINOMA GSE19417 - Human esophageal adenocarcinomas GSE22138 - Expression Data from Uveal Melanoma primary tumors. GSE39717 - Gene expression analysis of uveal melanoma tumor tissue GSE10846 - Prediction of survival in diffuse large B cell lymphoma treated with chemotherapy plus Rituximab GSE12417 - Prognostic gene signature for normal karyotype AML - Affy U133A Samples GSE12417 - Prognostic gene signature for normal karyotype AML - Affy U133B Samples GSE12417 - Prognostic gene signature for normal karyotype AML - Affy U133plus2 Samples Differences Between Follicular Lymphoma With and Without Translocation t(14;18) Differences Between Follicular Lymphoma With and Without Translocation t(14;18) An eight-gene expression signature for the prediction of survival and time to treatment in chronic lymphocytic leukemia An eight-gene expression signature for the prediction of survival and 95 14984 37 21703 566 21703 182 13720 121 17813 70 17367 63 21703 30 21454 414 21703 163 13480 162 9856 79 21212 180 13719 180 10686 44 13719 44 10684 HEME GSE22762_U133P2 HEME GSE23501 HEME GSE2658 HEME GSE4475 HEME TCGA_AML HNC E-MTAB-1328 HNC GSE10300 HNC HNSC_TCGA LIVER GSE10141 LIVER GSE17856 LIVER GSE27150 LUNG GSE11117 LUNG GSE11969 LUNG GSE13213 time to treatment in chronic lymphocytic leukemia An eight-gene expression signature for the prediction of survival and time to treatment in chronic lymphocytic leukemia GSE23501 - DNA methylation signatures define molecular subtypes of Diffuse Large B Cell Lymphoma GSE2658 - Gene Expression Profiles of Multiple Myeloma GSE4475 - A Biologic Definition of Burkitt's Lymphoma from Transcriptional and Genomic Profiling TCGA DATA ACUTE MYELOID LEUKEMIA E-MTAB-1328 - Methylome, transcriptome and miRNome profiling by array and high throughput sequencing of 89 patients with head and neck squamous cell carcinoma GSE10300 - head and neck squamous cell carcinoma samples TCGA HEAD AND NECK SQUAMOUS CELL CARCINOMA DATA GSE10141 - Gene Expression in Fixed Tissues and Outcome in Hepatocellular Carcinoma GSE17856 - Gene expression in nontumoral liver tissue and recurrence-free survival in hepatitis C virus-positive HCC GSE27150 - Transcriptional profile of human liver tissues: hepatocellular carcinoma vs. matched noncancerousliver tissue GSE11117 - Molecular Classification and Prediction of Survival in NonSmall-Cell Lung Cancer GSE11969 - Expression ProfileDefined Classification of Lung Adenocarcinoma GSE13213 - Relapse-related molecular signature in lung adenocarcinomas identifies patients 107 21703 69 21703 546 21703 158 13719 157 21212 60 21703 43 21703 291 17187 80 6100 43 14293 81 2456 41 9601 90 16531 117 30469 LUNG GSE14814 LUNG GSE17710 LUNG GSE19188 LUNG GSE26939 LUNG GSE30219 LUNG LUNG GSE31210 GSE3141 LUNG GSE37745 LUNG GSE41271 LUNG GSE42127 LUNG GSE4573 LUNG GSE50081 LUNG GSE5843 LUNG GSE8894 LUNG TCGA_LUAD LUNG TCGA_LUSC with dismal prognosis GSE14814 - Prognostic and Predictive Gene Signature for Adjuvant Chemotherapy in Resected Non-Small Cell Lung Cancer GSE17710 - Human lung squamous cell carcinoma expression profiling GSE19188 - Expression data for early stage NSCLC GSE26939 - Human lung adenocarcinoma mRNA expression and gene mutations GSE30219 - Off-context gene expression in lung cancer identifies a group of metastatic-prone tumors GSE31210 - Gene expression data for pathological stage I-II lung adenocarcinomas GSE3141 - Lung Cancer Dataset GSE37745 - Biomarker discovery in non-small cell lung cancer: integrating gene expression profiling, meta-analysis and tissue microarray validation GSE41271 - Expression profiling of 275 lung cancer specimens GSE42127 - Expression data for nonsmall-cell lung cancer GSE4573 - Gene expression signatures for predicting prognosis of squamous cell lung carcinomas GSE50081 - Validation of a histologyindependent prognostic gene signature for early stage, non-small cell lung cancer including stage IA patients GSE5843 - Expression profiling defines a recurrence signature in lung adenocarcinoma GSE8894 - Prediction of RecurrenceFree Survival in Postoperative NSCLC Patients - a Useful Prospective Clinical Practice TCGA DATA LUNG ADEMOCARCINOMA TCGA DATA LUNG SQUAMOUS CELL CARCINOMA 88 13481 56 17083 82 21703 115 17108 282 21703 226 111 21703 21703 96 21703 275 25428 176 25428 130 13480 181 21703 48 12089 138 21213 150 20502 120 17813 OVARIAN GSE13876 OVARIAN GSE14764 OVARIAN GSE17260 OVARIAN GSE18520 OVARIAN GSE19829_U133P2 OVARIAN OVARIAN GSE19829_U95V2 GSE23554 OVARIAN GSE26712 OVARIAN GSE30161 OVARIAN GSE31245 OVARIAN GSE32062 OVARIAN GSE32063 OVARIAN OVARIAN GSE49997 GSE8842 GSE13876 - Survival Related Profile, Pathways and Transcription Factors in Ovarian Cancer GSE14764 - A Prognostic Gene Expression Index in Ovarian Cancer GSE17260 - Prediction of progression-free survival in patients with advanced-stage serous ovarian cancer GSE18520 - Whole-genome oligonucleotide expression analysis of papillary serous ovarian adenocarcinomas GSE19829 - A gene expression profile of BRCAness that is associated with outcome in ovarian cancer. AffyU133plus2 Samples GSE19829 - A gene expression profile of BRCAness that is associated with outcome in ovarian cancer. AffyU95v2 Samples GSE23554 - Ovarian Cancer Dataset GSE26712 - A Gene Signature Predicting for Survival in Suboptimally Debulked Patients with Ovarian Cancer GSE30161 - Genomic Multivariate Predictors of Response to Adjuvant Chemotherapy in Ovarian Carcinoma: Predicting Platinum Resistance GSE31245 - Unique gene expression profile based upon pathologic response in epithelial ovarian cancer GSE32062 - Immune-activation as a therapeutic direction for patients with high-risk ovarian cancer based on gene expression signature GSE32063 - Immune-activation as a therapeutic direction for patients with high-risk ovarian cancer based on gene expression signature (2) GSE49997 - Validating the Impact of a Molecular Subtype in Epithelial Ovarian Cancer (EOC) on Progression Free and Overall Survival GSE8842 - ANALYSIS OF GENE 415 15971 79 13480 110 19566 53 21703 26 21212 39 28 9383 13719 185 13480 58 21703 55 9651 260 19595 40 19566 194 78 16726 5631 OVARIAN GSE9891 OVARIAN TCGA_OVAD PANCREAS GSE21501 PANCREAS PANCREAS GSE28735 TCGA PROSTATE GSE16560 PROSTATE GSE40272 RECTUM TCGA_READ RENAL GSE29609 RENAL GSE33371 RENAL TCGA_KIRC SKIN GSE19234 SKIN GSE22153 SKIN SKIN GSE53118 SKCM_TCGA EXPRESSION IN EARLY-STAGE OVARIAN CANCER GSE9891 - Expression profile of 285 ovarian tumour samples TCGA DATA OVARIAN ADENOCARCINOMA GSE21501 - A six-gene signature predicts survival of patients with localized pancreatic ductal adenocarcinoma GSE28735 - Microarray geneexpression profiles of 45 matching pairs of pancreatic tumor and adjacent non-tumor tissues from 45 patients with pancreatic ductal adenocarcinoma TCGA - TCGA PAAD GSE16560 - Molecular Sampling of Prostate Cancer: a dilemma for predicting disease progression GSE40272 - Gene-expression profiling of prostate tumors TCGA DATA RECTAL ADENOCARCINOMA GSE29609 - Clear-cell renal cell carcinomas tumors GSE33371 - Beta-catenin status effects in human adrenocortical carcinomas (33), adenomas (22), and normal adrenal cortex (10) TCGA DATA KIDNEY CLEAR CELL CARCINOMA GSE19234 - Immune profile and mitotic index of metastatic melanoma lesions enhance clinical staging in predicting patient survival. GSE22153 - Gene Experssion Profiling-Based Identification of Molecular Subtypes in Stage IV Melanoma with Different Clinical Outcome (test set) GSE53118 - BRAF Mutation, NRAS Mutation, and the Absence of an Immune-Related Expressed Gene Profile Predict Poor Outcome in Patients with Stage III Melanoma TCGA SKIN CUTANEOUS MELANOMA 276 21212 578 12042 102 19680 42 61 21096 20502 281 6100 67 9371 42 17728 39 18841 23 21703 528 20502 44 21703 57 24614 79 163 17617 15341 STOMACH TCGA_STAD UTERUS TCGA_UTED DATA TCGA DATA STOMACH ADENOCARCINOMA TCGA DATA UTERINE ENDOMETRIAL CARCINOMA 18 20512 54 17813 Supplementary Table 2: Survival Variables and Covariates (if any) available for datasets added to the PROGgeneV2 database. SURVIVAL VARIABLES OVERALL OVERALL OVERALL OVERALL OVERALL OVERALL OVERALL OVERALL OVERALL DATASET GSE16581 GSE2817 GSE30074 GSE37418 GSE42669 GSE4271_U133A GSE4271_U133B GSE4412_U133A GSE4412_U133B TISSUE BRAIN BRAIN BRAIN BRAIN BRAIN BRAIN BRAIN BRAIN BRAIN GSE12945 GSE14333 GSE16125 GSE24551 COLON COLON COLON COLON GSE28722 GSE30378 GSE31595 GSE41258 GSE19417 GSE22138 GSE39717 GSE10846 GSE16131_U133A GSE16131_U133B GSE22762_U133A GSE22762_U133B GSE22762_U133P2 COLON COLON COLON COLON ESOPHAGUS EYE EYE HEME HEME HEME HEME HEME HEME GSE23501 GSE2658 HEME HEME OVERALL RELAPSE FREE OVERALL OVERALL OVERALL , METASTASIS FREE OVERALL RELAPSE FREE OVERALL OVERALL METASTASIS FREE METASTASIS FREE OVERALL OVERALL OVERALL OVERALL OVERALL OVERALL OVERALL , RELAPSE FREE OVERALL GSE4475 E-MTAB-1328 GSE10300 GSE17856 HEME HNC HNC LIVER OVERALL METASTASIS FREE RELAPSE FREE RELAPSE FREE GSE13213 LUNG OVERALL COVARIATES AGE, GENDER AGE, GENDER AGE, GENDER GENDER, STAGE AGE, GENDER AGE, GENDER, GRADE AGE, GENDER, GRADE AGE, GENDER, GRADE AGE, GENDER, GRADE AGE, GENDER, TNMSTAGE, GRADE, UICC_STAGE AGE, GENDER, STAGE AGE, GENDER, STAGE STAGE AGE, STAGE STAGE AGE, GENDER, STAGE, CHEMOTHERAPY AGE, GENDER, STAGE, TNM_STAGE GENDER AGE, GENDER AGE, GENDER AGE, GENDER, STAGE, CHEMOTHERAPY STAGE STAGE AGE, GENDER AGE, GENDER, STAGE, CHEMOTHERAPY, RADIOTHERAPY AGE, GENDER, STAGE AGE, GENDER, STAGE, TNM_STAGE, EGFR_MUTATION, KRAS_MUTATION, P53_MUTATION GSE17710 GSE30219 LUNG LUNG GSE31210 GSE3141 LUNG LUNG GSE37745 GSE42127 GSE19188 LUNG LUNG LUNG GSE17260 GSE18520 GSE23554 OVARIAN OVARIAN OVARIAN GSE30161 GSE31245 OVARIAN OVARIAN GSE32063 GSE28735 OVARIAN PANCREAS OVERALL , RELAPSE FREE OVERALL OVERALL , RELAPSE FREE OVERALL OVERALL , RELAPSE FREE OVERALL OVERALL OVERALL , RELAPSE FREE OVERALL OVERALL OVERALL , RELAPSE FREE OVERALL OVERALL , RELAPSE FREE OVERALL TCGA GSE16560 GSE33371 GSE19234 GSE22153 GSE19776 GSE48276 PANCREAS PROSTATE RENAL SKIN SKIN ADRENAL BLADDER OVERALL OVERALL OVERALL OVERALL OVERALL OVERALL OVERALL GSE37751 BREAST GSE42568 GSE48408 GSE44001 GSE29621 BREAST BREAST CERVICAL COLON OVERALL OVERALL , RELAPSE FREE METASTASIS FREE RELAPSE FREE OVERALL GSE39582 COLON GSE41271 LUNG GSE50081 GSE49997 LUNG OVARIAN RELAPSE FREE OVERALL , RELAPSE FREE OVERALL , RELAPSE FREE OVERALL , AGE, GENDER, STAGE, GRADE AGE, GENDER AGE, GENDER, STAGE AGE, GENDER, STAGE, CHEMOTHERAPY AGE, GENDER, STAGE, CHEMOTHERAPY GENDER STAGE, GRADE GRADE AGE, STAGE, GRADE STAGE, GRADE AGE, GENDER, STAGE, GRADE, RADIOTHERAPY AGE, GENDER, STAGE AGE, GENDER, STAGE AGE, GENDER, STAGE AGE, GENDER AGE, STAGE, GRADE, ER, TRIPLE_NEG, CHEMOTHERAPY, HORMONAL_THERAPY AGE, GRADE, ER AGE, GRADE, ER STAGE GENDER, STAGE, GRADE AGE, GENDER, CHEMOTHERAPY, BRAF_MUTATION, KRAS_MUTATION, P53_MUTATION GENDER, STAGE AGE, GENDER, STAGE AGE, GRADE METASTASIS FREE GSE53118 SKIN OVERALL AGE, GENDER, STAGE Supplementary Figure 1: KM plot created with PROGgeneV2 for WNT/CTNNB1 pathway in high risk ovarian cancer cohort (GSE32062 [1]) References 1. Yoshihara K, Tsunoda T, Shigemizu D, Fujiwara H et al. High-risk ovarian cancer based on 126-gene expression signature is uniquely characterized by downregulation of antigen presentation pathway. Clin Cancer Res 2012 Mar 1;18(5):1374-85.