Supporting Information Quality assessment of sequence capture and library preparation ................................... 2 Estimation of sequence capture enrichment ................................................................... 6 Information on capture array design targeting MLL ......................................................... 8 Information on capture array design targeting RUNX1 .................................................... 9 Information on capture array design targeting PDGFRB ............................................... 10 Linker sequence and MID barcodes .............................................................................. 11 Detection of an insertion................................................................................................ 12 Detection of a deletion................................................................................................... 13 Detection of a point mutation ......................................................................................... 14 Detection of molecular mutations in 95 gene targets .................................................... 15 Summary statistics of fusion genes and dominant clusters ........................................... 16 Validation of unknown fusion events detected by NGS ................................................. 20 References .................................................................................................................... 21 1 Quality assessment of sequence capture and library preparation As demonstrated in Supplementary Figure 1 processing steps from initial genomic DNA fragmentation through polishing, adapter ligation, and clean-up were assessed by Agilent Bioanalyzer chip profiles. after nebulization Supplementary Fig. 1a Nebulization of genomic DNA. 20 µg of genomic DNA were fragmented in a nebulizer apparatus using 45psi of nitrogen gas for 60 seconds. The Agilent Bioanalyzer 2100 DNA Chip 7500 (Agilent Bioanalyzer 2100, Böblingen, Germany) depicts a representative molecular population of the generated DNA fragments. The size distribution in base pairs (bp) is given on the x-axis. 2 Library Post-AMPure Beads Nebulized Library after Polishing Library Post-Ligation with gSel Linkers Supplementary Fig. 1b Polishing, adaptor ligation and clean-up steps. A Bioanalyzer assessment is performed to verify the efficacy of the linker reaction. The Agilent Bioanalyzer 2100 DNA Chip 7500 depicts a representative sample including three curves: (i) nebulized library after polishing (red line); (ii) post-ligation library with gSel linkers and a dominant linker dimer peak at app. 40 bp (blue line), and (iii) purified and ligated DNA fragments with the majority of gSel linker dimers removed (green line). 3 size-selected DNA for library prep Supplementary Fig. 1c Size-selection process of the ligated DNA library. A Bioanalyzer assessment is performed to monitor a profile of the ligated library after sizeselection using the double solid phase reversible immobilization (SPRI) paramagnetic bead-based technology method (Beckman Coulter, Krefeld, Germany). The sizeselection step ideally enriches for DNA fragments between 500 and 800 bp, with less than 10% <350 bp and less than 10% >1000 bp (Agilent Bioanalyzer 2100 DNA Chip 7500). 4 final library stock Supplementary Fig. 1d Final library stock population. Agilent Bioanalyzer RNA Pico 6000 LabChip represents a profile of a final library stock. 5 Estimation of sequence capture enrichment Four distinct regions were amplified using the following primers as given in Supplementary Table 1. If the qPCR analysis using NSC assays indicates a successful capture of the control loci, it is likely that the experimental loci of interest targeted on the microarray were also successfully captured and enriched for sequencing. Supplementary Table S1 Sequence capture enrichment amplicons for qPCR Amplicon Primer sequences 5' - CGC ATT CCT CAT CCC AGT ATG - 3' (forward) NSC-0237 5' - AAA GGA CTT GGT GCA GAG TTC AG - 3' (reverse) 5' - CCC ACC GCC TTC GAC AT - 3' (forward) NSC-0247 5' - CCT GCT TAC TGT GGG CTC TTG - 3' (reverse) 5' - CTC GCT TAA CCA GAC TCA TCT ACT GT - 3' (forward) NSC-0268 5' - ACT TGG CTC AGC TGT ATG AAG GT - 3' (reverse) 5' - CAG CCC CAG CTC AGG TAC AG - 3' (forward) NSC-0272 5' - ATG ATG CGA GTG CTG ATG ATG - 3' (reverse) 6 case #24 assay: NSC-0247 ∆CP: 8.99 enrichment: 197.2-fold case #24 assay: NSC-0237 ∆CP: 10.01 enrichment: 447.5-fold captured non captured case #24 assay: NSC-0268 ∆CP: 9.78 enrichment: 281.3-fold captured captured non captured case #24 assay: NSC-0272 ∆CP: 10.21 enrichment: 823.3-fold non captured captured non captured Supplementary Figure 2 Calculation of sequence capture enrichment using qPCR. As exemplarily demonstrated, the crossing point (CP) values, assessed in triplicates each, from four qPCR assays of captured LM-PCR templates were significantly lower than CP values from non-captured templates. The differences in CP values are used to calculate the enrichment efficiency. The median result of the four distinct assays gives the estimated final enrichment factor per patient sample. 7 Information on capture array design targeting MLL As shown in Supplementary Figure 3, a customized capture array was targeting a contiguous region of chromosome 11q23 where the MLL gene is located (start: 117,812,370; end: 117,901,177; hg18 assembly). capture probes MLL Supplementary Figure 3 Custom NimbleGen 385K microarray to capture MLL sequences. In total 88,732 bases were targeted by capture probes. Hybridization probes covered the contiguous genomic target region with 92.1% (81,694 bases). 7.9% of bases were not covered by probes due to repetitive sequences, as indicated in the browser line “tiled_region”. 8 Information on capture array design targeting RUNX1 As shown in Supplementary Figure 4, a customized capture array was targeting a contiguous region of chromosome 21q22.3 where the RUNX1 gene is located (start: 36,160,052; end: 36,421,677; hg19 assembly). capture probes RUNX1 Supplementary Figure 4 Custom NimbleGen 385K microarray to capture RUNX1 sequences. In total 261,544 bases were targeted by capture probes. Hybridization probes covered the contiguous genomic target region with 97.6% (255,202 bases). 2.4% of bases were not covered by probes due to repetitive sequences, as indicated in the browser line “tiled_region”. 9 Information on capture array design targeting PDGFRB As shown in Supplementary Figure 5, a customized capture array was targeting a contiguous region of chromosome 5q33.1 where the PDGFRB gene is located (start: 149,493,355; end: 149,535,460; hg19 assembly). capture probes PDGFRB Supplementary Figure 5 Custom NimbleGen 385K microarray to capture PDGFRB sequences. In total 42,023 bases were targeted by capture probes. Hybridization probes covered the contiguous genomic target region with 98.9% (41,545 bases). 1.1% of bases were not covered by probes due to repetitive sequences, as indicated in the browser line “tiled_region”. 10 Linker sequence and MID barcodes Linker molecules were ligated to the DNA fragments in the patient-specific libraries to provide a priming site for post-enrichment amplification of the eluted pool of captured DNA molecules. Multiplex identifier (MID) sequences, i.e. a molecular barcode of 10 base length, were introduced into each patient’s genomic sample by ligation, and allowed a multiplexing design of the shotgun sequencing assay. Supplementary Table 2 Linker sequences and molecular barcodes Linkers gSel3 gSel4-Pi 5'-CTC GAG AAT TCT GGA TCC TC-3' 5'-Phos/GAG GAT CCA GAA TTC TCG AGT T-3' 10-base Multiplex Identifier Sequences (MIDs) MID-1 ACGAGTGCGT MID-2 ACGCTCGACA MID-3 AGACGCACTC 11 Detection of an insertion As shown in Supplementary Figure 6a (case N06), a small insertion was detected in the FLT3 gene. The 63 bp insertion was located in the juxtamembrane domain known to be frequently mutated in AML cases with a normal karyotype1. FLT3 insertion Supplementary Figure 6a Molecular insertion detected in the FLT3 gene. At position 27,506,303, a 63 bp insertion was detected. 12 Detection of a deletion As shown in Supplementary Figure 6b (case N01), a 3 bp deletion was detected in the KIT gene in codon D4192. KIT deletion Supplementary Figure 6b Molecular deletion detected in the KIT gene. At position 55,284,794 on chromosome 4, a 3 bp deletion was detected. 13 Detection of a point mutation As shown in Supplementary Figure 6c (case N04), a point mutation was detected in the KRAS gene3. KRAS point mutation CA Supplementary Figure 6c Point mutation detected in the KRAS gene. At position 25,289,552 on chromosome 12, a substitution of CA was detected resulting in an amino acid change from G to C (codon 12). 14 Detection of molecular mutations in 95 gene targets In addition to the proof-of-concept analysis for well-established typical AML mutations that were known for some of the cases analyzed with the 1.9 Mb capture array, a more global analysis was directed to investigating molecular mutations in all 95 gene targets. In Supplementary Table 3, for each case the number of intronic and exonic variants, also according to translational status is given. Supplementary Table S3 Summary of variants detected in 95 genes (1.9 Mb array) N01 N03 N04 N05 N06 median no. 1984 1745 1500 1701 944 1701 Ensembl variation database 1730 1524 1285 1508 871 Unknown variants 254 221 215 193 73 1672 1492 1223 1460 815 Ensembl variation database 1478 1303 1054 1293 749 Unknown variants 194 189 169 167 66 310 247 273 239 129 Ensembl variation database 251 217 229 214 122 Unknown variants 59 30 44 25 7 Deletions 15 5 12 4 0 Insertions 10 5 9 5 3 Substitutions 34 20 23 16 4 Not translated 37 18 28 13 3 Translated 22 12 16 12 4 Synonymous 11 6 8 6 3 Nonsynonymous 11 6 8 6 1 Variants Intronic variants Exonic variants 1460 247 6 15 Summary statistics of fusion genes and dominant clusters As shown in Supplementary Spreadsheet 3, the frequency table depicts the cluster size distribution for each fusion and the total number of clusters obtained for a given sample and capture array design. Next, the dominant cluster is further annotated and details are given according to the chromosomal breakpoints and reads per strand orientation. Importantly, many chimeric reads were artificial since they were introduced during sample preparation, but these reads were usually either removed during filtering (see Supplementary Spreadsheet 2) or formed singular clusters of size n=1. Instead, it was observed that in the majority of cases, a dominant cluster was detected by the objective statistics pipeline. Frequently, clusters of two or more recurrent chimeric reads and, in particular, reads covering both forward and reverse strands of a breakpoint region were of interest. In detail, after applying the filtering steps, case N01 (inv(16)(p13q22) characteristics) was harboring a total of 481 unique chimeric reads. Of these, 465 reads formed clusters of size n=1, 2 chimeric reads each formed clusters of size n=2 and a set of 12 reads formed the dominant cluster, respectively. All chimeric reads in this dominant cluster mapped to MYH11 and CBFB genes. As demonstrated in Supplemental Figure 7 two constellations of fusion events were detectable, i.e. 4 distinct reads were corresponding to the MYH11-CBFB fusion (Supplemental Figure 7a) and 8 additional chimeric reads were corresponding to the CBFB-MYH11 fusion (Supplemental Figure 7b), respectively. 16 a b Supplementary Fig. 7 Chimeric read distribution for case N01. a) In total, four chimeric reads were detectable covering the CBFB-MYH11 fusion. The chimeric reads are distributed according to the strand information and 5` → 3` orientation. b) In total, eight chimeric reads were detectable covering the MYH11-CBFB fusion. The chimeric reads are distributed according to the strand information and 5` → 3` orientation. 17 With respect to the other cases, the following observations were made and are summarized in Supplementary Table 4 (Supplemental Spreadsheet 3): Supplementary Table S4 Chimeric read interpretation and fusion genes Case N03 N04 N05 N14 N16 N17 N20 N21 N38 Interpretation The distinct dominant cluster contained 7 chimeric reads and confirmed the MLL-MLLT3 fusion known from routine operations. The distinct dominant cluster contained 8 chimeric reads and confirmed the RUNX1-RUNX1T1 fusion known from routine operations. The largest cluster (3 reads) contained the MLL-ELL fusion. Subsequent SNP microarray analysis identified the reciprocal SFRS14MLL fusion event, that was covered by 2 chimeric reads in the second largest cluster. The distinct dominant cluster contained 9 chimeric reads and identified the MLL-MLLT10 fusion, subsequently also validated by a corresponding PCR assay. The distinct dominant cluster contained 8 chimeric reads and identified the MLL-MLLT6 fusion, subsequently also validated by a corresponding PCR assay. In this interesting case, the largest dominant cluster contained 34 reads, all matching to sequences on chromosome 11. The second largest cluster contained 12 chimeric sequences, all matching to chromosomes 11 and 10, leading to a 5`-prime MLL-MLLT10 fusion. This complex rearrangement further led to the reciprocal fusion of the 3`-prime MLL gene to non-coding sequences on chromosome 11, thus explaining the dominant cluster. The MLL-MLLT10 fusion subsequently also was validated by a corresponding PCR assay. The distinct dominant cluster contained 11 chimeric reads and identified the MLL-MLLT1 fusion, subsequently also validated by a corresponding PCR assay. In this case only a clustering with size n=1 was observed. In total, 21 chimeric reads were detected. Amongst those 21 reads, 2 chimeric reads were located to chromosomes 11q23 and chromosomes 4 (which would fit with data from the karyotype), with one of these 2 reads identifying a MLL-AFF1 fusion. This fusion was subsequently also validated by PCR. In this case, routine testing had detected a MLL-MLLT10 fusion. Yet, the experiment failed to generate a distinct dominant cluster. Instead, the cluster with size n=2 was mapping to chromosomes 7 and 11q23. 18 N39 N40 N41 N42 N27 N28 N29 N30 N33 N36 N37 The MLL-PFTK1 fusion, however, was not confirmed by subsequent assays, indicating a failure of the capturing assay to identify the fusion. In this case, the dominant cluster contained two chimeric reads mapping to both chromosomes 6 and 11q23. However, the MLL-MLLT4 fusion was not identified by those 2 sequences. Instead, one chimeric read from the second largest cluster then contained the molecular chimeric reads explaining the MLL-MLLT4 fusion, which was already known from routine testing. The distinct dominant cluster contained 5 chimeric reads and confirmed the MLL-AFF1 fusion, known from routine operations. The distinct dominant cluster contained 5 chimeric reads and confirmed the MLL-ELL fusion, known from routine operations. The dominant cluster contained 2 chimeric reads for both possible constellations and confirmed the MLL-MLLT1 fusion, known from routine operations. The distinct dominant cluster contained 3 chimeric reads and identified a fusion between RUNX1 and sequences on chromosome 17. This fusion was subsequently validated by PCR. The dominant cluster contained 2 chimeric reads and identified a RUNX1-KCNMA1 fusion. This fusion was subsequently validated by PCR. The distinct dominant cluster contained 4 chimeric reads and identified a fusion between RUNX1 and sequences on chromosome 5. This fusion was subsequently validated by PCR. The distinct dominant cluster contained 5 chimeric reads and identified a fusion between RUNX1 and sequences on chromosome 10. This fusion was subsequently validated by PCR. This case harbored a t(12;21)(p13;q22) and was known to be positive for the ETV6-RUNX1 fusion. One corresponding chimeric read was contained in the cluster size n=1. Of note, the larger cluster with size n=2 did not correspond to chimeric reads identifying the ETV6-RUNX1 rearrangement. Overall, this specimen was sequenced several times, but the capturing assay only yielded 4 chimeric reads in total, explaining the poor experimental performance of the samples, possibly due to severe degradation of the input genomic DNA. The distinct dominant cluster contained 9 chimeric reads and identified the PDGFRB-DTD1 fusion, discovered and validated in parallel by Erben P et al., Blood (ASH Annual Meeting Abstracts) 2008 112: Abstract 3719. The distinct dominant cluster contained 5 chimeric reads and identified the PDGFRB-DTD1 fusion, discovered, validated, and published in parallel by Walz C et al., Genes Chromosomes Cancer. 2009 Feb;48(2):179-83. 19 Validation of unknown fusion events detected by NGS As shown in Supplementary Table 5, for 9 patients unknown fusion genes as detected by capturing MLL (n=6 fusions) or RUNX1 (5 fusions) were subsequently validated and confirmed by PCR assays using the following primer pairs. Supplementary Table S5 PCR Primer information for validation of fusion genes Patient N05 Fusion Forward primer (5’ – 3’) Reverse primer (5’ – 3’) MLL-SFRS14 GTCACCCCCGAGAGGACA CCTTCCACAAACGTGACAGA MLL-ELL AACCACTCCTAGTGAGCCCAAG AAGGAGGCTGCCAGTGCT N14 MLL-MLLT10 AACCACTCCTAGTGAGCCCAAG AGAGCGCTCCTACTTGTTGC N16 MLL-MLLT6 AACCACTCCTAGTGAGCCCAAG AGCTGCTCCATGTTAGTCGTC N17 MLL-MLLT10 MLL-MLLT1 RUNX1- KCNMA1 RUNX1-chr.17 chr.5-RUNX1 chr.10-RUNX1 CCAGTGAAAAGAAAGACAGCA TTCCTGGCTTCCAGACTCTC GAATGCAGGCACTTTGAACA CCTCGCCTGACGAAGAGTC AGCATGGTGGAGGTGCTG GGTCATGGCAGCTTCACTTC TTCAACTCCTGGACCAAACC ATTCCATGCTCCCAATTTGA CTCACAACAAGCTCCCATCA TGAATCTGGTAGCCCATCCT GCTTCAGGGCTCTCCTCAG ACTATTCCAGCGGGGTAGC RUNX1-chr.10 GCAGCCCTTTGATTTCACTC TGTTCCCTCCAAGGAGACTG N20 N28 N27 N29 N30 20 References 1 Schnittger S, Schoch C, Dugas M, Kern W, Staib P, Wuchter C, Loffler H, Sauerland CM, Serve H, Buchner T, Haferlach T, Hiddemann W. Analysis of FLT3 length mutations in 1003 patients with acute myeloid leukemia: correlation to cytogenetics, FAB subtype, and prognosis in the AMLCG study and usefulness as a marker for the detection of minimal residual disease. Blood 2002; 100: 59-66. 2 Kohl TM, Schnittger S, Ellwart JW, Hiddemann W, Spiekermann K. KIT exon 8 mutations associated with core-binding factor (CBF)-acute myeloid leukemia (AML) cause hyperactivation of the receptor in response to stem cell factor. Blood 2005; 105: 3319-3321. 3 Bowen DT, Frew ME, Hills R, Gale RE, Wheatley K, Groves MJ, Langabeer SE, Kottaridis PD, Moorman AV, Burnett AK, Linch DC. RAS mutation in acute myeloid leukemia is associated with distinct cytogenetic subgroups but does not influence outcome in patients younger than 60 years. Blood 2005; 106: 2113-2119. 21