Kustikova et al., Supplementary information Supplementary materials and methods Microarray analysis Murine primary Rosa26rtTA Lin- cells were prestimulated in vitro for 24 hours (cytokine cocktail S3F11=mSCF 50 ng/ml, mIL-3 20 ng/ml, hFlt-3 50 ng/ml, and hIL-11 50 ng/ml), transduced with the tetP-Evi1/EGFP vector, and induced for 24 hours with 1 g/ml of Doxycycline (DOX). Evi1/EGFP positive and Evi1/EGFP negative cells were sorted and total RNA was prepared using the RNeasy Micro Kit (Qiagen GmbH, Hilden, Germany). RNA quality was assessed using the Agilent 2100 Bioanalyzer. RNA was amplified by the Ovation RNA Amplification System V2 (NuGEN Technologies, Bemmel, The Netherlands). Fragmented amplified cDNA was biotin labeled using the NuGen Ovation Biotin labeling system (NuGEN Technologies) in three technological replicas and hybridized to Affymetrix Mouse Genome 430 2.0 GeneChip arrays (Affymetrix) (45,101 probe sets). Microarray data were submitted to the GEO database (accession number: GSE34729; http://www.ncbi.nlm.nih.gov/geo). Immunoblot analysis Murine primary Lin- Rosa26rtTA cells were prestimulated in vitro for 24 hours (S3F11 cytokine conditions), transduced with the tetP-Evi1/EGFP or tetP-EGFP vector for another 24 hours, and induced with DOX [1 g/ml]. In 24 hours 0.5-1.0x106 of cells were lysed in Laemmli buffer (2% SDS, 10% glycerol, 5% 2-mercaptoethanol, 0.002% bromophenol blue, 0.0625 M Tris HCl, pH 6.8), denatured 5 minutes at 99°C and separated by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (6.5%), transferred to nitrocellulose membranes (Bio-Rad, Munich, Germany), and probed with rabbit anti-Evi1 antibodies (Cell Signaling Technology, Beverly, MA, USA) at dilution 1:1000 in Tris-buffered saline with 0.05% Tween and 5% BSA (TBST- 5% BSA). HRP-conjugated goat anti-rabbit secondary antibodies (Santa Cruz Biotechnology, Heidelberg, Germany) were used at dilution 1:2000 in TBST buffer and detected by Super Signal West Pico kit (Thermo Scientific, Rockford, USA). Real-time PCR Total RNA was extracted by using RNeasy Micro kit (Qiagen, Hilden, Germany). To validate the transcriptional response of Cyclin genes important for cell cycle progression RNAs were extracted 1 Kustikova et al., Supplementary information from sorted Evi1/EGFP+ and EGFP+ cells (two additional independent experiments, and RNA of one of the biological replicates used for the microarrays shown in Figure 1 b-c). RNAs were reverse transcribed into cDNA with Quantiscript Reverse Transcriptase (QuantiTect Reverse Transcription Kit, Qiagen) using random hexamer primers and according to the manufacturer instructions. Real-time PCR analysis was performed on an Applied Biosystems StepOnePlus System (Foster City, CA) in three independent experiments in triplicates using the Quantitect SYBR Green Kit (Qiagen). The cDNA equivalent to 5-100 ng of input RNA was analyzed. Real-time PCR was performed with cDNA in 25 l using 96-well plates. Relative quantification of a target gene transcript in comparison to a reference Actin transcript was calculated using the method described by Pfaffl.1 To design primers the Primer3Plus program (http://www.bioinformatics.nl/cgi-bin/primer3plus/primer3plus.cgi) or https://www.roche-applied-science.com were used. Real-time PCR efficiencies were determined for all sets of designed primers.1 In some cases (Evi1co transcript) 10 l of resulting real-time PCR reaction were visualized by 1% agarose 0.5x TBE gel electrophoresis. Sequences of primers used: Primer Ccne1_F Ccne1_R Ccne2_F Ccne2_R Ccnd2_F Ccnd2_R Ccna2_F Ccna2_R Evi1co_F Evi1co_R Pre_F Pre_R Actb_F Actb_R Sequence (5’ – 3’) ctg aga gat gag cac ttt ctg c gag ctt ata gac ttc gca cac ct gag ctg tgg agg gtc tgg ggc tgc gtc ttg aca ttc tc cac cga caa ctc tgt gaa gc tcc act tca gct tac cca aca ctt ggc tgc acc aac agt aa caa act cag ttc tcc caa aaa ca cct tca tct gcg aag tgt gc cag tcc ttg cac ttg atc tgg gag gag ttg tgg ccc gtt gt tga cag gtg gtg gca atg cc cct ccc tgg aga aga gct a tcc ata ccc aag aag gaa gg LSK (Linneg/lo Sca1high Ckithigh) cells isolation The LSK hematopoietic subpopulation was isolated by staining freshly prepared Rosa26rtTA-nls-Neo2 bone marrow cells with a panel of biotin-conjugated antibodies against murine lineage markers (Gr1, CD11b, Ter119, CD3e, B220) followed by TruStain fcXTM (anti-mouse CD16/32) Fc receptor blocking and staining with Streptavidin conjugated FITC (eBiocience, San Diego, CA), allophycocyanin conjugated anti-mouse CD117 and phycoerythrin conjugated anti-mouse Ly-6A/E. Dead cells were excluded by forward scatter, side scatter and 4,6-diamidino-2-phenylindole gating. Cell aggregates 2 Kustikova et al., Supplementary information were excluded by forward-scatter area vs. pulse width gating. Gates were set according to control samples using the Fluorescence Minus One approach. A FACSAria instrument (Becton Dickinson, Heidelberg, Germany) was used for cell sorting. Antibodies were purchased from (BioLegend, London, UK), unless otherwise specified. Cytospin preparation 2 x 104 - 2 x 105 cells were centrifuged in 150 l PBS for 10 minutes at 800 rpm in Shandon Cytospin 4 Centrifuge (Thermo Electron Corporation, Pittsburgh, USA). Cytospins were stained after Pappenheim (Giemsa/May-Grünwald): 5 minutes in May-Grünwald followed by 30 minutes in Giemsa staining solutions (Sigma-Aldrich, Steinheim, Germany). Array-based comparative genome hybridization (array-CGH) analysis Array-CGH2 was performed using the Agilent Mouse Genome Microarray Kit 244k (Agilent Technologies, Santa Clara, CA, USA) high resolution 60-mer oligonucleotide based microarray with median overall probe spacing of about 10.9. Labelling and hybridization of genomic DNA was performed according to the protocol provided by Agilent. Microarray slides were scanned immediately using an Agilent microarray scanner at a resolution of 2µm. For image analysis, default CGH settings of Feature Extraction Software (Agilent Technologies, Waldbronn, Germany) were applied. Output files from Feature Extraction were subsequently imported into Agilent’s CGH data analysis software, DNAWorkbench. The Aberration Algorithm ADM2 was applied and Aberration Filters were set to: threshold 5.0, at least 4 probes with mean log2 ratio of -0.5. Comparison of microarray data set Evi1/EGFP+ versus EGFP+ Lin- Rosa26rtTA with human EVI1high AML A publicly available dataset (GSE15434) of 233 AML patients with normal karyotype 3 was downloaded. Arrays with less than 50% blasts in the samples were excluded, leaving 194 samples for the analysis. CEL files were preprocessed using RMA.4 For unsupervised clustering 2% of the annotated probe sets with the highest variance (IQR) was selected. The EVI1high subpopulation was defined as an expression of the EVI1 gene of more than 4-fold the median of all patients, fulfilled by 12 patients. For subsequent analyses all EVI1high patients in cluster 4 of the unsupervised analysis were chosen. 3 Kustikova et al., Supplementary information LIMMA5 and GSEA6 methods were used to detect differentially expressed genes and gene sets between EVI1 positive and negative patients. Supplementary results Analysis of the stability of gene expression profiling To assess the stability of gene expression we calculated the coefficient of variation (CV; in %) for each annotated gene (n= 20696) within the experimental groups and for the whole dataset (Supplementary Figure 4a). The mean CV for the EGFP+ and the Evi1/EGFP+ group was 4.01% and 3.30 %, respectively. The mean CV for the complete dataset was 4.6%, indicating a high overall stability,7 both within the experimental groups as well as over the whole dataset. The observation that the CV over the whole dataset was only slightly greater than within the groups indicates that expression of Evi1 did not cause global changes within the transcriptome but affected a defined subset of genes. Principle components analysis (PCA) confirmed overall stability of gene expression within the input groups and identified the biological effect caused by overexpression of Evi1 as the main source of variation in the experiment (Supplementary Figure 4b-e and Supplementary Table 6). Array-CGH analysis By means of array-CGH no significant genomic alterations were detected except for two microdeletions and two microduplications (Supplementary Figure 6c-d). Comparison of microarray data set Evi1/EGFP+ versus EGFP+ Lin- Rosa26rtTA with human EVI1high AML Using a stringent cutoff we identified 12 patients out of 194 (6%) with high EVI1 expression. In the remainder of the samples EVI1 expression was uniformly low (Supplementary Figure 9a). EVI1 positive EVI1high patients were characterized by the absence of NPM1, CEBPA or FLT3 mutations (Supplementary Figure 9b-c). 7 of these 12 patients clustered together in the unsupervised analysis (Supplementary Figure 9b) and were characterized by an expression profile that showed a notable upregulation of B-lymphoid deoxynucleotidyltransferase) genes like BLNK and IgHM (immunoglobulin (B cell linker), heavy constant DNTT mu), (terminal indicating a dedifferentiated state and lineage infidelity (Supplementary Figure 9d). Consequently, GSEA detected 4 Kustikova et al., Supplementary information gene set from B-cell progenitors and lymphoid primed stem cells to be enriched in Evi1 high patients (Supplementary Figure 9e). We could detect the enrichment of the Valk AML Clusters 10 and 11, which contained the EVI1high cases and the normal karyotypes, respectively, in the AML study of Valk et al.8 Interestingly, we found a highly significant overlap between our experimental results and the EVI1high patients regarding the upregulation of genes involved in interferon signaling. Furthermore, the GSEA algorithm detected the upregulation of gene sets derived from leukemic and hematopoietic stem cells (Supplementary Figure 9e). Supplementary Reference 1. Pfaffl MW. A new mathematical model for relative quantification in real-time RT-PCR. Nucleic Acids Res 2001; 29(9): e45. 2. Manukjan G, Tauscher M, Ripperger T, Schwarzer A, Schlegelberger B, Steinemann D. Induced G1 phase arrest of fast-dividing cells improves the quality of genomic profiles generated by array-CGH. Biotechniques 2012:53(4):245-248. 3. Kohlmann AL, Bullinger C, Thiede M, Schaich S, Schnittger KD, Döhner K, et al. Gene expression profiling in AML with normal karyotype can predict mutations for molecular markers and allows novel insights into perturbed biological pathways. Leukemia 2010; 24:1216-1220. 4. Gautier L, Cope L, Bolstad BM, Irizarry RA. affy--analysis of Affymetrix GeneChip data at the probe level. Bioinformatics 2004; 20: 307-315. 5. Smyth G, Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, et al. Limma: linear models for microarray data. In: Gentleman R (ed). Bioinformatics and Computational Biology Solutions using R and Bioconductor. Springer: New York, 2005, pp 397-420. 6. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 2005; 102: 15545-15550. 7. de Jonge HJ, Fehrmann RS, de Bont ES, Hofstra RM, Gerbens F, Kamps WA, et al. Evidence based selection of housekeeping genes. PLoS One 2007; 2: e898. 8. Valk PJ, Verhaak RG, Beijen MA, Erpelinck CA, Barjesteh van Waalwijk van Doorn-Khosrovani S, Boer JM et al. Prognostically useful gene-expression profiles in acute myeloid leukemia. N Engl J Med 2004; 350:1617-1628. 5