A comparative study of survival models for breast cancer prognostication based on microarray data: a single gene beat them all? B. Haibe-Kains, C. Desmedt, C. Sotiriou and G. Bontempi BIOINFORMATICS Vol. 24 no. 19 2008, pages 2200-2208 Outline • Introduction to microarray • Introduction – – – – Motivation Purpose Results Difficulties • Risk prediction methods • Performance assessment • Results Introduction to microarray • http://www.bio.davidson.edu/courses/geno mics/chip/chip.html • 清華大學 郝旭昶 Introduction - Motivation • Survival prediction of breast cancer (BC) patients, independently of treatment, also known as prognostication, is a complex task. – Clinically similar breast tumors and molecularly heterogeneous – Several clinical and pathological indicators such as histological grade, tumor size and lymph node have been used for the survival prediction of breast cancer. – Although BC prognostication has been the object of intense research, a still open challenge is how to detect patient who needs adjuvant systemic therapy. Introduction - Motivation • The advent of array-based technology and the sequencing of the human genome brought new insights into breast cancer biology and prognosis. • Several research teams conducted comprehensive genome-wide assessments of gene expression profiling and identified prognostic gene expression signatures. • With respect to clinical guidelines, these signatures were shown to correctly identify a larger group of low-risk patient not requiring treatment. Introduction - Motivation • In fact, clinicians encounter problems when confronted with patient with intermediate-grade tumors (Grade 2). These tumors, which represent 30-60% of cases, are a major source of interobserver discrepancy and may display intermediate phenotype and survival, making treatment decisions for the patients a great challenge, with subsequent under- or over-treatment. Introduction - Purpose • The aim of this work is to assess quantitatively the accuracy of prediction obtained with state-ofthe-art data analysis techniques for BC microarray data through an independent and thorough framework. • Compare the prediction accuracy of these methods in several BC microarray prognostication tasks to elucidate the key characteristics of a successful risk prediction method and to bring additional insights into BC biology. Introduction - Results • Complex prediction methods are highly exposed to performance degradation despite the use of cross-validation techniques for the following reasons: – The large number of variables – The reduced amount of samples – The high degree of noise • Our analysis shows that the most complex methods are not significantly better than the simplest one, a univariate model relying on a single proliferation gene. Introduction - Results • This result suggests the proliferation might be the most relevant biological process for BC prognostication. • The loss of interpretability deriving from the use of overcomplex methods may be not sufficiently counterbalabed by an improvement of the quality of prediction. Introduction - Difficulties • Censored information cannot be exploited by traditional supervised classification the regression methods, but demands the adoption of specific survival analysis techniques, like the semi-parametric Cox’s proportional hazards model. • When the number of explanatory variables exceeds by far the number of petients in the sample cohort, overfitting of naively applied data mining methods and overoptimistic performance assessment lie in wait. At the same time, it is very difficult to select the most relevant variables for prediction, because of their interdependency. Introduction - Difficulties • The lack of standards in performance assessment for risk prediction models. • The validation and the comparison of BC microarray prognostication methods are made difficult due to the lack of independent data. Risk prediction methods • In this work, we compare the performance of 13 risk prediction methods on more than 1000 patients. • The first risk prediction method is also the simplest one and defines the risk score as the expression of a single proliferation gene (AURKA). • The following 10 methods (from 2 to 11) are characterized by the type of observed genotype (input data), the dimension reduction strategy, the structure of the model, the learning algorithm and the predicted phenotype (outcome variable). Risk prediction methods Risk prediction methods • Genotype: – It can be the expression of • Single proliferation gene (AURKA) • Biologically driven selection of genes of interest (BD) • The whole genome (GW) – AURKA and the small set of genes in BD were selected to represent several biological processes in BC. The selected genes were AURKA, PLAU, STAT1, VEFG, GASP3, ESR1 and ERBB2, representing the proliferation, tumor invasion/metastasis, immune response, agiogenesis, apoptosis phenotypes and the ER and HER2 signaling, respectively. Risk prediction methods • Dimension reduction strategy: – A simple univariate ranking (RANK) of the k most relevant features – A selection of the first k principal components (PCA) • Structure of the model: – Multivariate (MULTIV) model – a linear combination of univariate modes (COMBUNIV) Risk prediction methods • Learning algorithm: – The linear combination of gene expressions weighted by the significance computed from the Wilconxon rank sum test (WILCOSON) – The multivariate linear regression model (LM) – The linear combination of gene expressions weighted by the significance computed from the univariate Cox’s proportional hazards model (COX) – The multivariate Cox’s model with L1 regularization (RCOX) Risk prediction methods • Phenotype: – The binary class defined by histological grades 1 and 3 (HG) – The censored survival data (SURA) – The time of events (TOE), i.e, the times from diagnosis until the patient experienced an event. Risk prediction methods • The last two model : – GENE76 (Wang et al., 2005), and GGI(Sotiriou et al., 2006b) – The GENE76 model is defined as hierarchical model using two linear combinations of the top gene expressions with respect to a ranking based on Cox’s proportional hazards mode. – The GGI model consists of a linear combination of the expressions of the top probes ranked according to their standardized mean difference between patients with histological grades 1 and 3 tumors. Performance assessment • In order to assess the performance of the risk prediction methods, we used five accuracy measures: – Time-dependent ROC Curve: • A standard technique for assessing the performance of a continuous variable for binary classification. – Sensitivity and specificity: • A widely used performance criterion for a clinical test is the pair {sensitivity, specificity} • For risk score prediction, we estimated the specificity for a sensitivity of 90% in accordance with the St Gallen and National Institutes of Health. • The larger the sensitivity and the specificity, the better is the predictability of time to event (TTE). Performance assessment – Concordance index: • The concordance index (C-index) computes the probability that, for a pair of randomly chosen comparable patients, the patient with the higher risk prediction will experience an event before the lower risk patient. • ri and rj stand for the risk predictions of the i-th and the j-th patient, respectively. • is the set of all the pairs of patient {i,j} who meet one of the following conditions: – Both patient i and j experienced an event and time ti<tj – Only patient I experienced an event and ti< tj. • The larger C-index, the better is the predictability of TTE. Performance assessment – Brier score: • The Brier score, denoted by BSC, is defined as the squared difference between an event occurrence and its predicted probabilities at time t. • The lower the BSC, the better is the predictability of TTE at time t. – Hazard ration: • The larger the HR, the larger is the difference in survival probabilities between the groups of patients, and consequently the better is the discrimination between low- and high-risk groups. Results • Breast cancer datasets – Four large microarray BC datasets • VDX – Includes the gene expressions of 286 untreated node-negative BC patients and war used to build GENE76 and to validate GGI. • TBG – Used as an official validation of GENE76 and GGI. • TAM – Homogeneously treated by tamoxifen therapy. • UPP – Treated with heterogeneous therapies. – These datasets are publicly available from the GEO database http://www.ncbi.nlm.nih.gov/geo/ Results • Risk score prediction The most complex models Results Discussion and conclusions • The loss of interpretability deriving from the use of overcomplex methods in survival analysis of BC microarray data might be not sufficiently counterbalanced by an improvement in the quality of prediction.