1 Supplemental material 2 3 “Complex molecular rearrangements in childhood acute myelogeneous leukemia with 4 translocation t(10;11)(p12;q23) revealed by paired-end mapping” by Sujal Ghosh, 5 Christoph Bartenhagen, Vera Okpanyi, Michael Gombert, Vera Binder, Andrea Teigler- 6 Schlegel, Jutta Bradtke, Silja Röttgers, Martin Dugas and Arndt Borkhardt 7 8 Material and methods 9 10 Cytogenetics 11 Classical cytogenetics (GTG-banding) was performed following standard procedures and 12 chromosomes were karyotyped according to ISCN 2009 (1) 13 14 Fluorescence in situ hybridization 15 For fluorescence in situ hybridization (FISH) a commercial dual color break apart probe 16 specific for the MLL gene located at chromosome 11q23 was used (Vysis LSI MLL Dual 17 Color, Break Apart Rearrangement Probe, Abbott, Illinois, USA). Preparation was 18 performed according to manufacturer’s instructions. In all cases, chromosomes were 19 counterstained with DAPI, and digital imaging, documentation, and analysis of the FISH 20 signals were performed on a Zeiss Axioplan 2 fluorescence microscope equipped with 21 appropriate filters and an Isis image analysis system (Metasystems, Altlussheim, 22 Germany). G-band-like images were achieved by use of the software to convert and 23 enhance the gray scale of the DAPI images to black and white. For each sample 100 24 nuclei were analyzed and a distance of 3 and more signal diameters was counted as 25 splitting. 26 27 Paired-end Sequencing 28 DNA was isolated from peripheral blood lymphocytes with DNA AllPrep DNA/RNA/Protein 29 Mini Kit (Qiagen, Hilden, Germany). 30 NanoDrop® ND-1000 spectrophotometer (Thermo Scientific, Waltham, MA, USA). 2 – 3 31 µg DNA was sheared on a Covaris S2 (Covaris Inc., Woburn, MA, USA). Illumina fragment 32 libraries (Paired-End Sample Preparation, Illumina Inc., San Diego, CA, USA) with a 33 median insert size of 450 bp were prepared using the SPRI works I method (Beckman- Quantity and quality were determined on the 34 Coulter, Krefeld, Germany) according to both manufacturer’s instructions. Samples of 35 patients 1-3 were sequenced on a GAIIx platform; after an in-house upgrade sequencing 36 of samples of patients 4-6 sequencing was performed on a HiSeq 2000 (both Illumina Inc.) 37 38 Bioinformatical analysis 39 Alignment 40 The paired-end alignment against the human reference genome (hg19/GRCh 37) has 41 been done with the Burrows-Wheeler Alignment Tool (BWA) (version 0.5.8c) (2) using the 42 default settings. The allowed number of mismatches has been set to 2 (GAIIx runs with 43 36bp reads) or 3 (HiSeq2000 runs with 50bp reads). The alignment consists of two 44 consecutive steps: 1. The global alignment against the reference genome for every end of 45 a read pair individually. 2. The assembly of both alignments of every pair with respect to 46 their insert size. This may include a local alignment of previously unmapped reads if the 47 mate could be mapped properly. All BWA alignments were given in Sequence 48 Alignment/Map (SAM) format (3) which has been converted into its binary equivalent, the 49 BAM format, for all following analyses. 50 51 Alignment postprocessing 52 Duplicated 53 MarkDuplicates from the Picard utilities (version 1.46) (http://www.picard.sourceforge.net) 54 has been used to remove reads having identical 5' mapping coordinates (both ends of a 55 paired-end read) and orientation. 56 Sorting of the alignment, either by mapping coordinate or by read name, has been done 57 with Picard as well. reads were excluded from 58 59 Copy number analysis with sequencing data all subsequent analyses. The function 60 The program FREEC (version 3.92) (4) was used to estimate copy number variations. It 61 takes the paired initial/remission or relapse/remission samples and normalizes the read 62 counts across fixed windows of size 10kb by performing a least-squares polynomial fitting 63 of read counts in the diseased sample and the corresponding control/remission. The 64 following segmentation step used LASSO regression (5) to merge the windows into larger, 65 contiguous regions showing copy number gains or losses. For more details on the 66 algorithm, see the program's publication (4). Except for the window size, all other 67 paramters were set to the default settings. Reads having a mapping quality below 10 were 68 excluded before copy number analysis. 69 70 Structural variation (SV) detection with paired-end sequencing data 71 Detection of translocations, deletions and inversions, was carried out with GASV(6) based 72 on the mapping coordinates (different chromosomes), anomalous insert sizes (greater 73 than the mean plus three times the standard deviation) or read orientations (inversions) of 74 the read pairs. First, such aberrant read pairs were filtered and compared to the 75 control/remission sample. Only uniquely mapped reads having a mapping quality of at 76 least 35 were used in the initial and relapse samples. For the control/remission, the quality 77 threshold has been set to 5. To correct false positive SVs due to mismappings, aberrant 78 read pairs having a proper alternative alignment with BLAT (7) were excluded. Finally, 79 those reads left from the intial or relapse sample after filtering and subtraction of the 80 remission were then joined together to clusters each representing the same SV 81 breakpoint. Only clusters with a minimum size of two (GAIIx runs) or three reads 82 (HiSeq2000 runs) and not overlapping SVs listed in the Database of Genomic Variants 83 (DGV) [3] were considered for further analysis. For more details on the algorithm for SV 84 detection and clustering, see the GASV publication (8). This approach, relying on aberrant 85 read pairs, yields approximate regions spanning a few hundred base pairs containing the 86 breakpoint (resolution depends on the insert size distribution). The region coordinates and 87 associated read orientations were used for primer design for validation by conventional 88 PCR. Genomic annotations, like affected genes, pathways and coding regions derived 89 from the KEGG (9) and Ensembl databases (10) and breakpoint overlaps between patients 90 and sequence depth of neighbouring regions (5kb up- and downstream) of the breakpoint 91 regions were computed within R (11) to assist SV selection for subsequent validation. 92 93 Manual inspection of reads 94 Figure S1. The Integrative Genomics Viewer (IGV) (Broad Insitute, Cambridge, MA, USA) 95 was used for next generation genomic data visualization (12). 96 97 Visualization 98 The circular plots were created with Circos (13). The outer ring shows the copy-number 99 ratios of 100kb windows as an orange scatter plot while larger segments with copy-number 100 gains or losses were highlighted in green and red respectively. All translocations detected, 101 according to the filter critria mentioned in the section before, were drawn as black links 102 between the affected chromosomes. The copy-number profiles show ratios of 10kb 103 windows and were created within R (11). 104 105 Validation 106 Selected translocations detected by paired-end read analysis were validated by 107 conventional PCR. Detailed information for the primers can be obtained from the 108 corresponding author. Capillary sequencing of the products was performed on an Applied 109 Biosystems 3130 Genetic Analyzer (Applied Biosystems, Foster City, Ca, USA). 110 111 Results 112 Sequencing performance 113 Due to upgrade of the sequencing platform, we had a significantly increased output with 114 the newer one; nevertheless the defining t(10;11) translocation could be identified with 115 both platforms. The GAIIx produced 96,000,000 to 287,000,000 total reads per sample (2- 116 6 lanes of one flow cell), the HiSeq2000 201,000,000 to 670,000,000 total reads per 117 sample (1-2 lanes of one flow cell). In each sample > 90% of reads could be aligned, 118 furthermore fragment coverage (percentage of the genome, which was covered by paired- 119 end fragments) was > 90% in each sample. For further coverage details see Table S2. In 120 each sample we observed numerous structural variants (translocations, inversions and 121 deletions). However, in-house studies with other sequencing projects show that library 122 preparation is prone to false-positive variants, which occurred in all in-house sequencing 123 projects dealing with biologically distinct malignancies. We established an in-house 124 “blacklist” to exclude these variants (can be obtained by the corresponding author). 125 126 Practically, paired-end analysis identifies base sequences of two ends (reads) of a 127 previously PCR amplified DNA fragment. The main principle is to acknowledge the fact 128 that these two paired-end reads are always base sequences supposed to be orientated 129 towards each other encompassing a certain fragment length in-between. 130 Alignment leads to the comparison of each read pair with the reference genome. In case of 131 alignment of one read to a certain DNA sequence in the reference genome, and alignment 132 of the corresponding read (mate) to a DNA sequence, which is not orientated towards its 133 mate in a certain distance (fragment length), a structural variant is detected. 134 135 Results patients 1,2,3 and 5 136 Patient 1 137 Figure S2. Previous FISH and cytogenetic analysis revealed a derivate chromosome 10 138 with der(10)t(10;11)(p12;q23)inv(11q13q23) and chromosome 11 with 139 der(11)t(10;11)(p12;q13). Sequencing results in this patient showed paired-end reads 140 between MLLT10 (m1a) and MLL (m1). Hence, we assume that these reads represented 141 the fusion gene consisting of MLL/MLLT10. The breakpoint on 10p12.31 was located in the 142 intronic region between exon 8 and 9 of MLLT10. The second breakpoint on 11q23.3 lied 143 between exon 10 and 11 of the MLL gene. We identified paired-end reads on chromosome 144 11 (m4, m4a), which both lied in the same orientation (minus-strand 11q14.1 and 11q23.3), 145 instead of pointing towards each other. This indicates an inversion of the fragment in- 146 between. In conclusion, we deduce that after inversion inv(11)(q14.1q23.3), the inverted 147 part was translocated to the derivative chromosome 10. Thus, the third breakpoint lied on 148 11q14.1. However, the derivative chromosome 11 remains unclear. Reads were found 149 around a suspected breakpoint on 16q23.3, which were paired with reads near the 150 MLLT10 (m2, m2a) and the 11q13.1 (m3, m3a) breakpoint. These data suggest a 151 previously undetected involvement of chromosome 16 in the 10;11 rearrangement, as 152 shown in Figure S2C. However, the latter reads were not found in the relapse sample, but 153 could be missed due to low coverage. There were no significant large areas of gains or 154 losses in copy number (Figure S2F). 155 156 Patient 2 157 Figure S3. In cytogenetics only two aberrant metaphases with an inconspicuous 158 chromosome 10 and a large metacentric chromosome 11 were revealed. In both 159 metaphases chromosome 12 was missing. At least one marker chromosome was found. 160 Interphase FISH aimed in 68% of cells with MLL-splitting. Neither in the MLL-FISH 161 analysis nor in the M-FISH analysis aberrant metaphases were found. 162 Paired-end sequencing revealed the MLL/MLLT10 translocation finding reads (m1a) at the 163 3’end of the intronic region 4/5 of MLLT10 and mates (m1) at the 5’end of the intronic 164 region 8/9 of the MLL gene. A second mate pair with a low coverage was found; one read 165 at the direct 5’end (m2a) of the MLLT10 breakpoint and the mate approximately 2Mb 166 upstream the 5’end from the reported MLL breakpoint (m2). However, both reads (m1 and 167 m2) were orientated in the same direction. As seen in patient 1 and patient 5, these data 168 suggest that an inversion occurred on chromosome 11q23.3. Detection of copy number 169 variations showed areas of gains in the long arms of chromosome 1, 13 and 21 and a loss 170 in chromosome 12p (Figure S3E). The latter were not found by cytogenetics. These might 171 be included in the marker chromosomes. 172 173 Patient 3 174 Figure S4. Cytogenetics and FISH deteced an insertion of long arm material of 175 chromosome 11 into 10p, but no additional aberrations (46,XY,ins(10;11)(p11;q23q12). 176 Reads were located at the 3’end (minus-strand) (m1) and 5’end (plus-strand) (m3) of the 177 intronic region 8/9 in the MLLT10 gene (10p12.3). Counterparts were detected in 11q12.1 178 (m3a) and 11q23.3 (m1a), respectively, indicating the MLL/MLLT10 fusion gene (Figure 179 S4C). Furthermore a “deleted” region within chromosome 11 could be verified (m2, m2a); 180 these reads were orientated towards each other correctly but encompass a fragment of 181 60Mb, instead of the sequenced fragment size of 450bp only. These reads were located 182 on opposite sides of the breakpoints (11q12.1 and 11q23.3). M1/m1a and m3/m3a indicate 183 that the genomic fragment, which is “deleted” on 11q, was subsequently inserted into the 184 MLLT10 gene. There were no significant changes in copy number (Figure S4E). Validation 185 PCR and capillary sequencing verified the MLL/MLLT10 breakpoint (Figure S4D). 186 187 Patient 5 188 Figure S5. Cytogenetics and FISH could verify a 46,XX,t(10;11)(p12;q23)inv(11)(q14q23) 189 karyotype. Next generation data detected, similar to patient 1, an inversion between 190 11q14.2 and 11q23.3, as both mates (m2, m2a) of a read pair, encompassing a 28Mb 191 spanning region, were located in the same orientation on the minus strand, instead of 192 pointing towards each other. 193 Furthermore the MLL/MLLT10 gene fusion was detected by m1/m1a (orientated towards 194 the breakpoints in the intronic region of 9-10 of the MLLT10 gene and intronic region 8-9 of 195 the MLL gene). Hence, we can deduce from this pattern that after an inversion of the 196 indicated region had occurred on chromosome 11, the whole fragment and telomeric 197 region was translocated to the MLLT10 region on chromosome 10. M3, m3a indicated the 198 reciprocal translocation on the derivative chromosome 11 (q14.2) and the terminal region 199 p12.31 of chromosome 10. Validation PCR and capillary sequencing verified the 200 MLL/MLLT10 breakpoint, confirming the fusion gene consisting of MLL exon 1-8 and 201 MLLT10 exon 9-24 (Fig. S5D). There were no significant changes in copy number. 202 203 204 References 205 1. Shaffer LG SM, Campbell LJ. ISCN 2009: An International System for Human Cytogenetic Nomenclature. S. Karger, 2009. 206 207 208 2. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics (Oxford, England) 2009 Jul 15; 25(14): 1754-1760. 209 210 211 3. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence 212 Alignment/Map format and SAMtools. Bioinformatics (Oxford, England) 2009 Aug 213 15; 25(16): 2078-2079. 214 215 4. Boeva V, Zinovyev A, Bleakley K, Vert JP, Janoueix-Lerosey I, Delattre O, et al. 216 Control-free calling of copy number alterations in deep-sequencing data using GC- 217 content normalization. Bioinformatics (Oxford, England) 2011 Jan 15; 27(2): 268- 218 269. 219 220 5. Tibshirani R. Regression shrinkage and selection via the lasso. J Royal Statist Soc B 1996; 58(1): 267-288. 221 222 223 6. Sindi S, Helman E, Bashir A, Raphael BJ. A geometric approach for classification 224 and comparison of structural variants. Bioinformatics (Oxford, England) 2009 Jun 225 15; 25(12): i222-230. 226 227 7. Kent WJ. BLAT--the BLAST-like alignment tool. Genome research 2002 Apr; 12(4): 656-664. 228 229 230 8. Zhang J, Feuk L, Duggan GE, Khaja R, Scherer SW. Development of bioinformatics 231 resources for display and analysis of copy number and other structural variants in 232 the human genome. Cytogenetic and genome research 2006; 115(3-4): 205-214. 233 234 9. Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M. KEGG for integration and 235 interpretation of large-scale molecular data sets. Nucleic acids research 2012 Jan; 236 40(Database issue): D109-114. 237 238 10. Flicek P, Amode MR, Barrell D, Beal K, Brent S, Chen Y, et al. Ensembl 2011. Nucleic acids research 2011 Jan; 39(Database issue): D800-806. 239 240 241 11. R Core Team R. A Language and Environment for Statistical Computing. [cited; Available from: http://www.R-project.org 242 243 244 12. Thorvaldsdottir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV): 245 high-performance genomics data visualization and exploration. Briefings in 246 bioinformatics 2012 Apr 19. 247 248 13. Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, et al. Circos: 249 an information aesthetic for comparative genomics. Genome research 2009 Sep; 250 19(9): 1639-1645. 251 252 253 254 255 256 Tables 257 258 patient no./ sex 1 female 2 female FAB 3 male 4 female 5 female 6 male MLL M4 34% M5 68% M4 71,5% 76,50% M5 51% M5 88,5% karyotyping 46,XX,der(10)t(10;11)(p12;q23)inv(11q13q23), der(11)t(10;11)(p12;q13) 43~45,XX,der(10)?t(10;11)(p11;q23), der(11)?t(10;11)(p11;q23),2mar,inc[cp2]/46,XX[ 7] 46,XY,ins(10;11)(p11;q23q12) 45,XX,-13,der(17)t(13;17)(q31;p13).ish ins(10;11)(p12;q23q23)(5’MLL+;3’MLL+) 46,XX,t(10;11)(p12;q23)inv(11)(q1?4q23) 46,XY,der(10)t(10;11)(p12;q23)inv(11)(q13q23), der(11)t(10;11)(p12;q13),-17,+mar[9]/45, idem,Y[2]/46,XY[3] age at diagnosis 13 mo. age at relapse 17 mo. 18 yrs. 10½ yrs. 5¼ yrs. 5¾ yrs. 9 mo. 24 mo. 8 yrs. Table S1. Patient characteristics 259 260 Patient 261 262 263 264 Platform Initial Remission Relapse lanes SC FC lanes SC FC lanes SC FC 3 85% 3x 91% 11,6x 1 GAIIx 6 84% 2,8x 90% 9,6x 2 67% 1,5x 89% 5,5x 2 GAIIx 3 75% 1,8x 90% 7,8x 2 57% 1x 88% 4x 3 GAIIx 3 75% 1,8x 92% 7,8x 2 63% 1,2x 90% 5,1x 4 HiSeq2000 1 91% 5,6x 92% 18,2x 2* 91% 9,52x 92% 30x 2* 91% 8,6X 92% 28x 5 HiSeq2000 1 91% 9,6x 92% 26,9x 1 91% 8,5x 92% 26x 1 91% 9,2x 92% 27x 6 HiSeq2000 1 87% 3x 92% 9,1x 1 92% 6,8x 92% 21,5x Table S2. Sequencing results: SC = sequence coverage, FC = fragment coverage, *TruSeq v3 used for library preparation, TruSeq v4 for other samples patient 4 to 6. 265 Figure legends 266 267 268 269 270 271 272 Figure S1. IGV displays paired-end reads aligned to the reference genome. The light gray bars indicate reads, in which both mates (~ 450bp distance from each other) can be aligned perfectly to the reference genome (depicted in the top bar). Colored reads indicate structural variants, their mate pairs appear in the same color. In case of corresponding mates on different chromosomes a translocation is indicated. If reads span a region on the same chromosome larger or smaller than the 450bp fragment length an insertion or deletion, respectively, is revealed. Paired-end reads are supposed to be orientated towards each other. In case alignment depicts reads in the same orientation, an inversion is indicated. 273 274 275 276 277 278 279 280 281 282 283 284 285 Figure S2. patient 1 - A) FISH analysis: Green color of the MLL probe identifies the proximal 5’ part of the MLL gene, red color identifies the distal 3’part of the gene. The MLL probe shows a MLL split signal (both green and red) on 10p. B) Karyotyping results C) Rearrangement profile: Molecular pattern of rearrangement, revealed by paired-end sequencing. The illustration consists of a normal reference genome in the upper region with found paired-end reads (e.g. m1 and m1a) aligned to the genome. As each read (e.g. m1) is supposed to be orientated to its mate (e.g. m1a) the type of rearrangement can be deduced; e.g. m4a and m4 on the same strand suggest an inversion, m1 and m1a on different chromosomes suggest a translocation D) Schematic overview of the translocation harboring the MLL/MLLT10 fusion gene. E) CIRCOS plot of the initial sample: Genomic landscape of interchromosomal translocations were scattered across the whole genome and allocated along the outer ring (chromosome ideograms). The inner ring represents copy-number status in terms of gains and losses. F) CNV plot: Deep blue colored bars indicate the copy number determined by sequencing data in relation to the reference genome. In case of gains and losses bars are elevated or lowered. 286 287 288 289 Figure S3. patient 2 - A) Karyotyping results B) Rearrangement profile: Molecular pattern of rearrangement C) Schematic overview of the translocation harboring the MLL/MLLT10 fusion gene D) CIRCOS plot of the initial sample E) CNV plot: We detected gains in chromosome 1, 13 and 21 with 3 copies instead of 2 in the depicted regions. In chromosome 12 we see a loss. 290 291 292 293 294 Figure S4. patient 3 - A) FISH analysis: The MLL probe shows a MLL split signal on different chromosomes, on 10p and 11q respectively, indicating an insertion B) Karyotyping results C) Molecular pattern of rearrangement D) Schematic overview of the translocation harboring the MLL/MLLT10 fusion gene. The fusion sequence was further validated by capillary sequencing revealing the breakpoint at 1-bp level. E) CIRCOS plot of the initial sample F) CNV plot. 295 296 297 298 Figure S5. patient 5 - A) FISH analysis: The MLL probe shows a MLL split signal (both green and red) on 10p B) Karyotyping results C) Molecular pattern of rearrangement D) Schematic overview of the translocation harboring the MLL/MLLT10 fusion gene. The fusion sequence was further validated by capillary sequencing revealing the breakpoint at 1-bp level. E) CIRCOS plot of the initial sample F) CNV plot. 299 300 Figure S6. Location of MLL, MLLT10, RNF169 and RNF214 on chromosome 10 and 11 and their corresponding distances. 301