Supplementary Figure legends Supplementary Figure1: Nucleotide and amino acid sequences of the RUNX1TMEM48 transcript. The nucleotide fusion junction is indicated by the > symbol and the exon 5 to intron 5-6 transition by the / symbol. The * symbol indicates the STOP codon (GenBank no. KF305770). The polyadenylation signals are underlined. A comparative analysis of the polyadenylation signals with the mammalian polyA signal structure (Urich-(0-20nt)-AAUAAA-(15-30nt)-CA-(0-20nt)G/UorUrich ) is presented at the bottom of the figure. Supplementary Figure2: (A) Expression of the RUNX1/TMEM48 fusion gene detected in the t(1;21)(p32;q22)-positive patient BM sample. (B) Quantitative real-time PCR analysis showing RUNX1a and RUNX1b expression in the patient and a healthy BM samples. It is represented the average expression from 3 experiment replicas; the error bars represent variability between qRT-PCR measurements. Supplementary Figure3: Lentiviral construct derived from the pRRL vector used to express RUNX1a, RUNX1b, RUNX1/TMEM48 and RUNX1/ETO with the tomato fluorescent protein marker and peptide 2A sequences. LTR: long terminal repeats; WPRE: woodchuck hepatitis post-transcriptional regulatory element. Supplementary Figure4: Growth curve of two different replicas of the long-term culture assay. Percentage values are relative to the value of transduced cells (100%). The values 1 are from one experiment but are representative of three replicated independent experiments performed. Supplementary Figure5: Gene set enrichment analysis (GSEA) was performed with a gene set obtained from the study of Maiques-Diaz et al., which comprises 1168 RUNX1/ETO target genes34. The graph on the bottom of each panel represents the ranked, ordered, non-redundant list of genes. Vertical black lines indicate the positions of genes from the studied gene set in the ordered, non-redundant dataset. The green curve corresponds to the ES (enrichment score) curve, which is the running sum of the weighted enrichment score obtained from GSEA software. FDR, false discovery rate; NOM, nominal. Supplementary Figure6: ChIP analysis of the occupancy of promoter regions of RUNX1, CTCF, MAPKL, MLLT3 and YES1. Immunoprecipitation signal over input signal, error bars represent the SD of three independent experiments. Supplementary Figure7: Hierarchical tree diagram, showing clustering of differentially expressed genes according to their expression pattern dynamics. Dendrograms and heat maps were obtained using the Cluster-Tree View program within the GeneSpring GX (Agilent) platform. Red = upregulation, green = downregulation, and black = no modulation. Supplementary Figure8: Quantitative RT-PCR verification of differential gene expression in the hHPS cell models. Expression of HOXA10, HOXA9, CD34, PAX5, 2 NOTCH1, MMP2 and FOXP3 was determined by quantitative RT-PCR 7 days after transduction. Data are means ± SEM (n=3) and are presented as the fold difference with respect to cells transduced with empty vector. *P<0.05, **P<0.01, ***P<0.001. Supplementary Table 1: Primer sequences used in RACE, RUNX1 screening and qRTPCR assays. Supplementary Table 2: Linear/exponential regression statistical analysis to model the relationship between the luciferase assay variables by fitting a linear/exponential equation to observed data. A linear regression line has an equation of the form Y = a + bX, where X is the explanatory variable and Y is the dependent variable. The slope of the line is b, and a is the intercept (the value of y when x = 0). The exponential formula is Y=exp(a+bX) 3