KF09-LEU-1207R: Kohlmann et al. Figure legends for Supplementary Figures Figure S1 Reproducibility between centers and operator proficiency. In total, 80 distinct gene expression profiles were included. The data was generated by 10 laboratory operators using 4 different sample types from commercially available total RNA sources (BRAIN, UHR, MCF-7, and HEPG2). The data was partly generated in a recently published study called DACH, i.e. contributed by the centers Basel, Geneva, and Linz.3 (a) In an unsupervised hierarchical clustering using the top-10,000 variant genes clear signatures can be observed. Importantly, the top dendrogram accurately groups the 4 total RNA sources. The similarity was computed by Euclidean distance, and then Ward’s method was used to cluster the gene expression profiles based on these measures. The normalized expression value for each gene is coded by color (standard deviation from mean). Red cells indicate high expression and green cells indicate low expression. (b) In the unsupervised principal component analysis based on the top-30,000 probe sets in the dataset all samples were clearly clustered according to the sample type. In the upper plot colors are representing the 6 laboratories. In the lower plot colors are identifying the distinct total RNA sources. Each of the 80 gene expression datasets is represented by a single sphere. Figure S2 Comparison to Alcalay et al., Blood 2005. The panel on the right is a representation of a 20-gene signature to identify cases with NPM1 mutations (NPM c+) as described in Figure 1C in the original publication by Alcalay et al. in 2005.7 The panel on the left highlights the confirmation of 16 (80%) of 20 probes as common in both datasets, i.e. demonstrating a differential gene expression between NPM1-mutated and wild type cases. Gene symbols with asterisks were identified by multiple probe sets. Figure S3 Comparison to Verhaak et al., Blood 2005. The panel on the right is a representation of the 18-gene signature, represented by 22 probe sets to identify cases with NPM1 mutations (NPM1 mutant) as described in Figure 2 in the original publication by Verhaak et al. in 2005. 8 The panel on the left highlights in yellow the confirmation of 17 (94%) of 18 candidate genes as common in both datasets. Note: Approved HGNC gene symbols are given for SMC4 (previously SMC4L1) and DMXL2 (previously RC3). Figure S4 Gene expression signature of CEBPA mutations in AML. In the hierarchical clustering heatmap 234 cases are displayed, visualizing a multicenter signature for CEBPA mutated AML. The top-500 differentially expressed genes were calculated from a pairwise comparison according to CEBPA mutation status. The similarity was computed by Euclidean distance, and then Ward’s method was used to cluster the gene expression profiles based on these measures. The normalized expression value for each gene is coded by color (standard deviation from mean). Red cells indicate high expression and green cells indicate low expression. Figure S5 HOX genes expression signature of CEBPA-mutated AML. In the hierarchical clustering heatmap 234 cases are displayed, visualizing a signature for selected HOXA and HOXB cluster candidates. The normalized expression value for each gene is coded by color (standard deviation from mean). Red cells indicate high expression and green cells indicate low expression. Figure S6 AML-NK with a silenced CEBPA phenotype. The panel on the left is a representation of a gene expression signature to identify cases with a silenced CEBPA phenotype as described in Figure 5 in the original publication by Wouters et al. in 2007.13 The panel on the right highlights an identical signature of one case in our cohort with confirmed CEBPA wild type status. Figure S7 Gene expression signature of FLT3-ITD mutations in AML. In the hierarchical clustering heatmap 251 cases are displayed, visualizing a multicenter signature for FLT3-ITD mutated AML. The top-500 differentially expressed genes were calculated from a pairwise comparison according to FLT3-ITD mutation status. The normalized expression value for each gene is coded by color (standard deviation from mean). Red cells indicate high expression and green cells indicate low expression. Figure S8 Comparison to Bullinger et al., Blood 2008. The panel on the left is a representation of a 20-gene signature to identify cases with FLT3-ITD mutations as described in Figure 1A in the original publication by Bullinger et al. in 2008.14 The panel on the right highlights the confirmation of 12 (60%) of 20 genes as common in both datasets. Additional genes marked with asterisks were significantly differentially expressed, but not contained in the top-500 probe sets list (DPPA4, HOXB2, PDE4B).