1 SUPPLEMENTARY INFORMATION 2 MATERIALS AND METHODS 3 Patient samples 4 Blood and marrow cells from both donor and recipient were obtained from the South 5 Australian Cancer Research Biobank. Mesenchymal stromal cells (MSC) were cultured from 6 BM aspirates as a source of germline control DNA. Whole exome sequencing (WES) was 7 performed on three donor samples (MSC, AML diagnosis and relapse) and one recipient 8 sample (AML diagnosis). In addition, paired diagnostic and remission samples from 12 other 9 patients with DNMT3A-mutant AML were available for targeted sequencing. None of these 10 12 patients had therapy-related AML or an antecedent diagnosis of a hematological 11 neoplasm. 12 13 Whole Exome Sequencing (WES) and Targeted Massively Parallel Sequencing 14 WES was performed using a Roche NimbleGen capture kit and sequenced on the Illumina 15 HiSeq2500. Briefly 1 g of genomic DNA was sheared to a mean fragment size of 200 bp 16 using the Covaris S220 before conversion to barcoded DNA libraries using a TruSeq DNA 17 LT Sample Preparation Kit (Illumina, San Diego, CA USA). After purification, libraries 18 were quantified by Agilent Bioanalyzer HS DNA assay and combined equally into pools of 6 19 prior to solution phase capture using the SeqCap EZ Exome Library v3.0 (Roche NimbleGen, 20 Madison, WI USA). The three donor (mesenchymal stem cells, diagnosis, relapse) and one 21 recipient (diagnosis) samples were sequenced together with other unrelated samples on five 22 Illumina HiSeq2500 flowcells (v3 SBS chemistry 2x100PE), with 6 samples multiplexed per 23 lane. All but the mesenchymal stem cell sample were included on two flowcells. The number 24 of sequenced fragments for the mesenchymal stem cell sample was 30 million, while for the 25 other three samples there were 121, 86 and 71 million fragments, respectively). 1 26 27 Targeted Massively Parallel Sequencing was performed on a custom 29 gene panel (all 28 coding regions) of myeloid genes (Supplementary Table 3) using an Ion Torrent AmpliSeq 29 approach. Briefly, the targeted gene libraries were generated from 10 ng of genomic DNA 30 using the Ion AmpliSeq Library Kit v2.0 and the custom primer pool as per the 31 manufacturer’s protocol (Life Technologies, Guilford, CT USA). After adapter ligation and a 32 5-cycle PCR amplification incorporating barcodes the libraries were quantified by Agilent 33 Bioanalyzer HS DNA assay and combined equally into a pool of 12 samples. The library 34 pool was diluted to 6 pM and templated onto Ion Sphere Particles (ISPs) by emulsion PCR 35 using the automated Ion OneTouch2 system with the Ion P1 Template OT2 200 Kit (Life 36 Technologies). ISP Sequencing was done using an Ion P1 chip (Ion P1 Sequencing 200 Kit 37 v3 chemistry) on the Ion Proton. 38 39 Sequence analysis 40 The WES reads were mapped to the human genome (hg19) using bwa sampe (v0.6.2). 41 Sorting and indexing was carried out using samtools (v0.1.12a) followed by duplicate 42 marking using picard (v1.71). Mapping resulted in average coverage over the Nimblegen 43 capture regions of 34.1, 96.5, 94.2 and 76.1 for the donor’s mesenchymal stem cells, 44 diagnosis, relapse and the recipient’s diagnosis sample, respectively. 45 46 The GATK toolkit (v2.5.2-v2.8.1) was used to realign indels, recalibrate quality scores and 47 its UnifiedGenotyper was used to call variants (multi-sample calling) according to the 48 Broad’s “best practices pipeline” for the GATK v2 series. Variants were annotated using the 49 ACRF Cancer Genome Facility’s custom annotation pipeline based on SnpEff and SnpSift 50 (v3.3)18. Annotation information was taken from Ensembl (v73)19, dbSNP (v137), the 1000 2 51 Genomes project (integrated phase 1, v3)20, the Exome Sequencing project (6500SI-V2)21, 52 COSMIC (v67)22, GERP scores23 as well as other public databases. 53 A rudimentary filtering was imposed on variants to remove those that were relatively unlikely 54 to be of interest by imposing two conditions. First, we demanded that the variant had to be 55 rare (<0.5%) both in the 1092 individuals of the 1000 Genomes project and the 4300/2203 56 European/African-Americans of the Exome sequencing project. Secondly, variants were only 57 retained if they either showed evidence that they were evolutionarily conserved, either in 58 mammalian (GERP ≥ 2) or other vertebrate (PhastCons ≥ 0.9) species, or if their predicted 59 functional impact had the potential to be non-trivial (i.e. not synonymous coding and not 60 classified by SnpEff to be a “modifier”), or if they were known somatic mutations occurring 61 in the COSMIC database. Finally, we compared the variants passing the above criteria to an 62 in-house collection of 51 exomes of patients with non-hematological malignancies, some of 63 which were sequenced concurrently with the four exomes considered here. If the variant 64 occurred more than once (heterozygous) in this collection then it was discarded. In total, 65 these filters reduced the total number of sites to be considered further to 7480. 66 67 The Ion Torrent targeted sequencing data was processed with the Torrent Suite™ software 68 v4.0.1 using the AmpliSeq workflow. This suite automates the generation of sequence reads, 69 trimming of adapter sequences and the removal of poor quality reads. Variant calls were 70 made using the Torrent Variant Caller plugin (4.0-5, 72041) using the Somatic Mutation 71 default settings except for ‘SNP minimum allele frequency’ (0.5%) and ‘Indel min allele 72 frequency’ (1.25%). Variants were annotated using SnpEff, COSMIC and local in-house 73 databases as detailed above. 74 75 3 76 Additional cases of DNMT3A-mutant AML 77 DNMT3A mutation load of paired diagnostic and remission samples from 12 AML patients 78 with DNMT3A (R882H/C) was performed using a custom Sequenom MassArray assay 79 (Sequenom, Inc., San Diego, CA USA). Allele loads of concurrent mutations in isocitrate 80 dehydrogenase 1 and 2 (IDH1/2), Kirsten rat sarcoma oncogene homolog (KRAS) and NPM1 81 were measured using a custom Sequenom assay, Sanger sequencing and a restriction 82 fragment length polymorphism assay, respectively. 83 84 Study oversight 85 The research was approved by the Royal Adelaide Hospital Human Research Ethics 86 Committee and all patients gave written informed consent. 4 87 Supplementary Table 1. Gene mutations in the two brothers. 88 Gene DNMT3A NPM1 FLT3 IDH1 NOTCH4 WT1 SMC1A Genome (hg19) chr2:g.25457242C>T chr5: g.170837544_170837547dupTCTG chr13:g.28592642C>A chr2:g.209113112C>T chr6:g.32178533C>T chr11:g.32413566G>A chrX:g.53423420T>C mRNA Transcript NM_022552; c.2645G>A NM_002520; c.860_863dupTCTG NM_004119; c.2503G>T NM_005896; c.395G>A NM_004557; c.2861G>A NM_024424; c.1180C>T NM_006306; c.2680A>G 89 5 Protein p.Arg882His p.Trp288Cysfs*12 p.Asp835Tyr p.Arg132His p.Cys954Tyr p.Arg394Trp p.Ile894Val 90 91 92 Supplementary Table 2. Mutation allele loads of 12 DNMT3A-mutant AML patients who achieved complete remission after induction chemotherapy. Patient Age at Sex # Diagnosis (years) Karyotype Duration Sample DNMT3A IDH1 IDH2 NPM1 KRAS of CR1 type R882H R882C R132C R132H R140Q R172K W288Cfs*12 G12D (days) 1 65 M 46,XY,ins(17;2) (p13;p21p23)[20]/ 46,XY[1] 268 Dx 95 2 39 F 46,XX-7,+8[23]/ 46,XX[7] NA CR1 Dx 54 45 3 69 F 46, XX 71 4 37 M 46, XY 360 49 58 47 48 29 56 53 5 64 F 46, XX >1,460 6 63 F 46, XX 767 7 53 M 46,XY,+1~22 dmin[cp29].ish dmin(ETO-,c-MYC,MLL-,RUNX1-)/ 46,XY[6] 1188 CR1 Dx CR1 Dx CR1 Rel CR2 Dx CR1 CR1 Dx CR1 Rel CR2 Dx 55 15 27 41 54 45 12 17 0 0 491 CR1 CR2 Dx 8 50 F 46,XX,del(7) (q?31.2)[10]/ 46,XX,-7,+mar[2]/ 91~92,XXXX,-7, -7,+marx1~2[6]/ 46,XX[2] 9 60 F 46, XX 230 10 65 M 46, XY 100 11 64 M 47,XY,+4,del(4) (q12q31)[7]/ 46,XY[13] 181 12 60 F 46, XX 159 32 0 48 0 48 0 NA 0 43 0 46 0 50 0 60 13 41 CR1 CR2 P1 P2 Dx CR1 Dx CR1 Dx 58 0 53 CR1 Dx CR1 0 30 0 6 26 0 0 0 72 5 5 0 0 0 70 0 0 0 0 0 55 0 49 0 93 94 95 Supplementary Table 3. AmpliSeq 29 Gene Panel list. The entire coding region of each gene was encompassed by massively parallel sequencing for mutation detection. Genes ASXL1 BAP1 BRAF CBL CEBPA DNMT3A EGFR EZH2 GATA2 IDH1 IDH2 JAK1 JAK2 KIT KRAS MET MPL MYD88 NOTCH1 NPM1 NRAS PTPN11 RUNX1 SF3B1 SRP72 SRSF2 TET2 U2AF1 XPO1 96 97 98 7 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 REFERENCES 18. Cingolani P, Platts A, Wang le L, Coon M, Nguyen T, Wang L, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 2012; 6: 80-92. 19. Flicek P, Amode MR, Barrell D, Beal K, Billis K, Brent S, et al. Ensembl 2014. Nucleic Acids Res 2014; 42: D749-755. 20. Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, et al. An integrated map of genetic variation from 1,092 human genomes. Nature 2012; 491: 5665. 21. Tennessen JA, Bigham AW, O'Connor TD, Fu W, Kenny EE, Gravel S, et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 2012; 337: 64-69. 22. Forbes SA, Bhamra G, Bamford S, Dawson E, Kok C, Clements J, et al. The Catalogue of Somatic Mutations in Cancer (COSMIC). Curr Protoc Hum Genet 2008; Chapter 10: Unit 10 11. 23. Cooper GM, Stone EA, Asimenos G, Green ED, Batzoglou S, Sidow A. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res 2005; 15: 901-913. 8