Supplementary Results and Figures Supplementary Results We investigated the eight genes for association with survival using the online tool KaplanMeier (KM)-plotter which has the gene expression and survival data of more than 2000 patients (but are not part of the METABRIC dataset). We found that the collective expression of the six overexpressed genes (MELK, MCM10, CENPA, EXO1, TTK and KIF2C) significantly associated with relapse free survival (RFS) and distant metastasis free survival (DMFS) in all patients, ER+ patients, lymph node negative (LN-) or positive (LN+) patients (Supplementary Table 3). The two underexpressed genes (MAPT and MYB) also significantly associated with RFS and DMFS in these patient groups (Supplementary Table 4). 1 Supplementary Figures Supplementary Figure 1: Global gene expression meta-analysis of genes deregulated in TNBC, metastatic events and death at 5 years in OncomineTM. (A) TNBC in 8 datasets were compared to non-TNBC, (B) tumors with metastatic events at 5 years were compared to those with no metastatic events at 5 years in 7 datasets and (C) tumors leading to death at 5 years were compared to those that did not lead to death at 5 years were compared in 7 datasets. The datasets used in the comparisons are stated in the legends and the key for the heatmap coloring is also included. The heatmap key denotes the top or bottom x % placement of a gene according to gene rank which is based on the p-value. 2 Supplementary Figure 2: The derivation of the 206 aggressiveness gene list. (A and B) are Venn diagrams for the top overexpressed genes and bottom underexpressed genes shared between TNBC and/or metastasis and death at 5 years analyses in OncomineTM. (C and D) The Venn diagrams from A and B were crossed with genes which were deregulated in TNBC in comparison to adjacent normal breast tissue from the METABRIC dataset. The genes marked in bold in panels C and D are the 206 genes which constitute the unfiltered aggressiveness gene list. 3 Supplementary Figure 3: Common genes between the 206 aggressiveness gene list and metagene attractors. Venn diagrams show common genes (in bold) between the 206 aggressiveness gene list and the chromosomal instability (CIN), lymphocyte-specific and ER attractors (Cheng et al 2013a, Cheng et al 2013b). The table below lists the shared genes. The 6 overexpressed genes (marked in red) and 2 underexpressed genes (marked in green) which consistute the 8-genes signature in this study are shown. Gene set enrichment analysis of the remaining 140 genes which were only present in the 206 gene signature reveal that these genes function in cell cycle. 4 Supplementary Figure 4: Correlation of breast cancer subtypes and the aggressiveness gene list. The METABRIC dataset was visualized according to the expression of the 206 genes in the aggressiveness gene list. The aggressiveness score for each tumor was calculated as the sum of normalized z-score expression values of overexpressed genes divided by that of underexpressed genes. (A and B) The expression of the aggressiveness gene list was visualized according to PAM50 intrinsic subtypes and the integrative clusters classification. Box plots show the aggressiveness score of these subtypes. The shaded lines in box plots mark the median value for the aggressiveness score. *** p < 0.001 One-Way ANOVA using GraphPad® Prism. Kaplan-Meier curves are of overall survival of patients in the METABRIC dataset stratified according to the quartiles (left plot) or the median (middle plot) of the aggressiveness score in ER+ patients with Grade 3 tumors. Tumors of the five PAM50 intrinsic subtypes which show high aggressiveness score (higher than the median) did not show statistical difference in overall survival (right plot). The hazard ratio (HR) and the 95% confidence interval (CI) and the p-value are reported using the Log-rank Test. 5 Supplementary Figure 5: Survival of the PAM50 breast cancer subtypes in the METABRIC dataset according to the aggressiveness score. The survival of patients in the METABRIC dataset annotated based on the PAM50 subtypes was analyzed by dichotomy across the median aggressiveness score from the 206 gene list (A) and the reduced 8 gene list (B). The p-value are reported using the Log-rank Test in GraphPad® Prism and show that all tumors with the different PAM50 subtypes but high aggressiveness score did not show a difference in patient survival (left graphs), whereas the PAM50 subtypes showed significantly different survival only in low aggressiveness score setting. 6 Supplementary Figure 6: TTK staining association with patient survival. The overall survival of patients in a large cohort of breast cancer patients (n=409) was stratified according to TTK staining by IHC (scores 0-3). Kaplan-Meier survival curves are shown for all patients (with four TTK staining categories 0-3 and two categories (0-2 vs. 3) with 10 and 20 years follow up. Log-rank Test and p-value were used for survival curves of all patients. There was no statistical differences in the survival of patients with Grade 1, Grade 2 or hormone positive tumors when stratified by TTK expression. Survival curves and statistical analyses were performed using GraphPad® Prism. 7 Supplementary Figure 7: Criteria used for assigning ‘prognostic subgroups’ in this study. 8