Supplementary Data (doc 111K)

advertisement
1
Supplemental material
2
3
“Complex molecular rearrangements in childhood acute myelogeneous leukemia with
4
translocation t(10;11)(p12;q23) revealed by paired-end mapping” by Sujal Ghosh,
5
Christoph Bartenhagen, Vera Okpanyi, Michael Gombert, Vera Binder, Andrea Teigler-
6
Schlegel, Jutta Bradtke, Silja Röttgers, Martin Dugas and Arndt Borkhardt
7
8
Material and methods
9
10
Cytogenetics
11
Classical cytogenetics (GTG-banding) was performed following standard procedures and
12
chromosomes were karyotyped according to ISCN 2009 (1)
13
14
Fluorescence in situ hybridization
15
For fluorescence in situ hybridization (FISH) a commercial dual color break apart probe
16
specific for the MLL gene located at chromosome 11q23 was used (Vysis LSI MLL Dual
17
Color, Break Apart Rearrangement Probe, Abbott, Illinois, USA). Preparation was
18
performed according to manufacturer’s instructions. In all cases, chromosomes were
19
counterstained with DAPI, and digital imaging, documentation, and analysis of the FISH
20
signals were performed on a Zeiss Axioplan 2 fluorescence microscope equipped with
21
appropriate filters and an Isis image analysis system (Metasystems, Altlussheim,
22
Germany). G-band-like images were achieved by use of the software to convert and
23
enhance the gray scale of the DAPI images to black and white. For each sample 100
24
nuclei were analyzed and a distance of 3 and more signal diameters was counted as
25
splitting.
26
27
Paired-end Sequencing
28
DNA was isolated from peripheral blood lymphocytes with DNA AllPrep DNA/RNA/Protein
29
Mini Kit (Qiagen, Hilden, Germany).
30
NanoDrop® ND-1000 spectrophotometer (Thermo Scientific, Waltham, MA, USA). 2 – 3
31
µg DNA was sheared on a Covaris S2 (Covaris Inc., Woburn, MA, USA). Illumina fragment
32
libraries (Paired-End Sample Preparation, Illumina Inc., San Diego, CA, USA) with a
33
median insert size of 450 bp were prepared using the SPRI works I method (Beckman-
Quantity and quality were determined on the
34
Coulter, Krefeld, Germany) according to both manufacturer’s instructions. Samples of
35
patients 1-3 were sequenced on a GAIIx platform; after an in-house upgrade sequencing
36
of samples of patients 4-6 sequencing was performed on a HiSeq 2000 (both Illumina Inc.)
37
38
Bioinformatical analysis
39
Alignment
40
The paired-end alignment against the human reference genome (hg19/GRCh 37) has
41
been done with the Burrows-Wheeler Alignment Tool (BWA) (version 0.5.8c) (2) using the
42
default settings. The allowed number of mismatches has been set to 2 (GAIIx runs with
43
36bp reads) or 3 (HiSeq2000 runs with 50bp reads). The alignment consists of two
44
consecutive steps: 1. The global alignment against the reference genome for every end of
45
a read pair individually. 2. The assembly of both alignments of every pair with respect to
46
their insert size. This may include a local alignment of previously unmapped reads if the
47
mate could be mapped properly. All BWA alignments were given in Sequence
48
Alignment/Map (SAM) format (3) which has been converted into its binary equivalent, the
49
BAM format, for all following analyses.
50
51
Alignment postprocessing
52
Duplicated
53
MarkDuplicates from the Picard utilities (version 1.46) (http://www.picard.sourceforge.net)
54
has been used to remove reads having identical 5' mapping coordinates (both ends of a
55
paired-end read) and orientation.
56
Sorting of the alignment, either by mapping coordinate or by read name, has been done
57
with Picard as well.
reads
were
excluded
from
58
59
Copy number analysis with sequencing data
all
subsequent
analyses.
The
function
60
The program FREEC (version 3.92) (4) was used to estimate copy number variations. It
61
takes the paired initial/remission or relapse/remission samples and normalizes the read
62
counts across fixed windows of size 10kb by performing a least-squares polynomial fitting
63
of read counts in the diseased sample and the corresponding control/remission. The
64
following segmentation step used LASSO regression (5) to merge the windows into larger,
65
contiguous regions showing copy number gains or losses. For more details on the
66
algorithm, see the program's publication (4). Except for the window size, all other
67
paramters were set to the default settings. Reads having a mapping quality below 10 were
68
excluded before copy number analysis.
69
70
Structural variation (SV) detection with paired-end sequencing data
71
Detection of translocations, deletions and inversions, was carried out with GASV(6) based
72
on the mapping coordinates (different chromosomes), anomalous insert sizes (greater
73
than the mean plus three times the standard deviation) or read orientations (inversions) of
74
the read pairs. First, such aberrant read pairs were filtered and compared to the
75
control/remission sample. Only uniquely mapped reads having a mapping quality of at
76
least 35 were used in the initial and relapse samples. For the control/remission, the quality
77
threshold has been set to 5. To correct false positive SVs due to mismappings, aberrant
78
read pairs having a proper alternative alignment with BLAT (7) were excluded. Finally,
79
those reads left from the intial or relapse sample after filtering and subtraction of the
80
remission were then joined together to clusters each representing the same SV
81
breakpoint. Only clusters with a minimum size of two (GAIIx runs) or three reads
82
(HiSeq2000 runs) and not overlapping SVs listed in the Database of Genomic Variants
83
(DGV) [3] were considered for further analysis. For more details on the algorithm for SV
84
detection and clustering, see the GASV publication (8). This approach, relying on aberrant
85
read pairs, yields approximate regions spanning a few hundred base pairs containing the
86
breakpoint (resolution depends on the insert size distribution). The region coordinates and
87
associated read orientations were used for primer design for validation by conventional
88
PCR. Genomic annotations, like affected genes, pathways and coding regions derived
89
from the KEGG (9) and Ensembl databases (10) and breakpoint overlaps between patients
90
and sequence depth of neighbouring regions (5kb up- and downstream) of the breakpoint
91
regions were computed within R (11) to assist SV selection for subsequent validation.
92
93
Manual inspection of reads
94
Figure S1. The Integrative Genomics Viewer (IGV) (Broad Insitute, Cambridge, MA, USA)
95
was used for next generation genomic data visualization (12).
96
97
Visualization
98
The circular plots were created with Circos (13). The outer ring shows the copy-number
99
ratios of 100kb windows as an orange scatter plot while larger segments with copy-number
100
gains or losses were highlighted in green and red respectively. All translocations detected,
101
according to the filter critria mentioned in the section before, were drawn as black links
102
between the affected chromosomes. The copy-number profiles show ratios of 10kb
103
windows and were created within R (11).
104
105
Validation
106
Selected translocations detected by paired-end read analysis were validated by
107
conventional PCR. Detailed information for the primers can be obtained from the
108
corresponding author. Capillary sequencing of the products was performed on an Applied
109
Biosystems 3130 Genetic Analyzer (Applied Biosystems, Foster City, Ca, USA).
110
111
Results
112
Sequencing performance
113
Due to upgrade of the sequencing platform, we had a significantly increased output with
114
the newer one; nevertheless the defining t(10;11) translocation could be identified with
115
both platforms. The GAIIx produced 96,000,000 to 287,000,000 total reads per sample (2-
116
6 lanes of one flow cell), the HiSeq2000 201,000,000 to 670,000,000 total reads per
117
sample (1-2 lanes of one flow cell). In each sample > 90% of reads could be aligned,
118
furthermore fragment coverage (percentage of the genome, which was covered by paired-
119
end fragments) was > 90% in each sample. For further coverage details see Table S2. In
120
each sample we observed numerous structural variants (translocations, inversions and
121
deletions). However, in-house studies with other sequencing projects show that library
122
preparation is prone to false-positive variants, which occurred in all in-house sequencing
123
projects dealing with biologically distinct malignancies. We established an in-house
124
“blacklist” to exclude these variants (can be obtained by the corresponding author).
125
126
Practically, paired-end analysis identifies base sequences of two ends (reads) of a
127
previously PCR amplified DNA fragment. The main principle is to acknowledge the fact
128
that these two paired-end reads are always base sequences supposed to be orientated
129
towards each other encompassing a certain fragment length in-between.
130
Alignment leads to the comparison of each read pair with the reference genome. In case of
131
alignment of one read to a certain DNA sequence in the reference genome, and alignment
132
of the corresponding read (mate) to a DNA sequence, which is not orientated towards its
133
mate in a certain distance (fragment length), a structural variant is detected.
134
135
Results patients 1,2,3 and 5
136
Patient 1
137
Figure S2. Previous FISH and cytogenetic analysis revealed a derivate chromosome 10
138
with
der(10)t(10;11)(p12;q23)inv(11q13q23)
and
chromosome
11
with
139
der(11)t(10;11)(p12;q13). Sequencing results in this patient showed paired-end reads
140
between MLLT10 (m1a) and MLL (m1). Hence, we assume that these reads represented
141
the fusion gene consisting of MLL/MLLT10. The breakpoint on 10p12.31 was located in the
142
intronic region between exon 8 and 9 of MLLT10. The second breakpoint on 11q23.3 lied
143
between exon 10 and 11 of the MLL gene. We identified paired-end reads on chromosome
144
11 (m4, m4a), which both lied in the same orientation (minus-strand 11q14.1 and 11q23.3),
145
instead of pointing towards each other. This indicates an inversion of the fragment in-
146
between. In conclusion, we deduce that after inversion inv(11)(q14.1q23.3), the inverted
147
part was translocated to the derivative chromosome 10. Thus, the third breakpoint lied on
148
11q14.1. However, the derivative chromosome 11 remains unclear. Reads were found
149
around a suspected breakpoint on 16q23.3, which were paired with reads near the
150
MLLT10 (m2, m2a) and the 11q13.1 (m3, m3a) breakpoint. These data suggest a
151
previously undetected involvement of chromosome 16 in the 10;11 rearrangement, as
152
shown in Figure S2C. However, the latter reads were not found in the relapse sample, but
153
could be missed due to low coverage. There were no significant large areas of gains or
154
losses in copy number (Figure S2F).
155
156
Patient 2
157
Figure S3. In cytogenetics only two aberrant metaphases with an inconspicuous
158
chromosome 10 and a large metacentric chromosome 11 were revealed. In both
159
metaphases chromosome 12 was missing. At least one marker chromosome was found.
160
Interphase FISH aimed in 68% of cells with MLL-splitting. Neither in the MLL-FISH
161
analysis nor in the M-FISH analysis aberrant metaphases were found.
162
Paired-end sequencing revealed the MLL/MLLT10 translocation finding reads (m1a) at the
163
3’end of the intronic region 4/5 of MLLT10 and mates (m1) at the 5’end of the intronic
164
region 8/9 of the MLL gene. A second mate pair with a low coverage was found; one read
165
at the direct 5’end (m2a) of the MLLT10 breakpoint and the mate approximately 2Mb
166
upstream the 5’end from the reported MLL breakpoint (m2). However, both reads (m1 and
167
m2) were orientated in the same direction. As seen in patient 1 and patient 5, these data
168
suggest that an inversion occurred on chromosome 11q23.3. Detection of copy number
169
variations showed areas of gains in the long arms of chromosome 1, 13 and 21 and a loss
170
in chromosome 12p (Figure S3E). The latter were not found by cytogenetics. These might
171
be included in the marker chromosomes.
172
173
Patient 3
174
Figure S4. Cytogenetics and FISH deteced an insertion of long arm material of
175
chromosome 11 into 10p, but no additional aberrations (46,XY,ins(10;11)(p11;q23q12).
176
Reads were located at the 3’end (minus-strand) (m1) and 5’end (plus-strand) (m3) of the
177
intronic region 8/9 in the MLLT10 gene (10p12.3). Counterparts were detected in 11q12.1
178
(m3a) and 11q23.3 (m1a), respectively, indicating the MLL/MLLT10 fusion gene (Figure
179
S4C). Furthermore a “deleted” region within chromosome 11 could be verified (m2, m2a);
180
these reads were orientated towards each other correctly but encompass a fragment of
181
60Mb, instead of the sequenced fragment size of 450bp only. These reads were located
182
on opposite sides of the breakpoints (11q12.1 and 11q23.3). M1/m1a and m3/m3a indicate
183
that the genomic fragment, which is “deleted” on 11q, was subsequently inserted into the
184
MLLT10 gene. There were no significant changes in copy number (Figure S4E). Validation
185
PCR and capillary sequencing verified the MLL/MLLT10 breakpoint (Figure S4D).
186
187
Patient 5
188
Figure S5. Cytogenetics and FISH could verify a 46,XX,t(10;11)(p12;q23)inv(11)(q14q23)
189
karyotype. Next generation data detected, similar to patient 1, an inversion between
190
11q14.2 and 11q23.3, as both mates (m2, m2a) of a read pair, encompassing a 28Mb
191
spanning region, were located in the same orientation on the minus strand, instead of
192
pointing towards each other.
193
Furthermore the MLL/MLLT10 gene fusion was detected by m1/m1a (orientated towards
194
the breakpoints in the intronic region of 9-10 of the MLLT10 gene and intronic region 8-9 of
195
the MLL gene). Hence, we can deduce from this pattern that after an inversion of the
196
indicated region had occurred on chromosome 11, the whole fragment and telomeric
197
region was translocated to the MLLT10 region on chromosome 10. M3, m3a indicated the
198
reciprocal translocation on the derivative chromosome 11 (q14.2) and the terminal region
199
p12.31 of chromosome 10. Validation PCR and capillary sequencing verified the
200
MLL/MLLT10 breakpoint, confirming the fusion gene consisting of MLL exon 1-8 and
201
MLLT10 exon 9-24 (Fig. S5D). There were no significant changes in copy number.
202
203
204
References
205
1.
Shaffer LG SM, Campbell LJ. ISCN 2009: An International System for Human
Cytogenetic Nomenclature. S. Karger, 2009.
206
207
208
2.
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler
transform. Bioinformatics (Oxford, England) 2009 Jul 15; 25(14): 1754-1760.
209
210
211
3.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence
212
Alignment/Map format and SAMtools. Bioinformatics (Oxford, England) 2009 Aug
213
15; 25(16): 2078-2079.
214
215
4.
Boeva V, Zinovyev A, Bleakley K, Vert JP, Janoueix-Lerosey I, Delattre O, et al.
216
Control-free calling of copy number alterations in deep-sequencing data using GC-
217
content normalization. Bioinformatics (Oxford, England) 2011 Jan 15; 27(2): 268-
218
269.
219
220
5.
Tibshirani R. Regression shrinkage and selection via the lasso. J Royal Statist Soc
B 1996; 58(1): 267-288.
221
222
223
6.
Sindi S, Helman E, Bashir A, Raphael BJ. A geometric approach for classification
224
and comparison of structural variants. Bioinformatics (Oxford, England) 2009 Jun
225
15; 25(12): i222-230.
226
227
7.
Kent WJ. BLAT--the BLAST-like alignment tool. Genome research 2002 Apr; 12(4):
656-664.
228
229
230
8.
Zhang J, Feuk L, Duggan GE, Khaja R, Scherer SW. Development of bioinformatics
231
resources for display and analysis of copy number and other structural variants in
232
the human genome. Cytogenetic and genome research 2006; 115(3-4): 205-214.
233
234
9.
Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M. KEGG for integration and
235
interpretation of large-scale molecular data sets. Nucleic acids research 2012 Jan;
236
40(Database issue): D109-114.
237
238
10.
Flicek P, Amode MR, Barrell D, Beal K, Brent S, Chen Y, et al. Ensembl 2011.
Nucleic acids research 2011 Jan; 39(Database issue): D800-806.
239
240
241
11.
R Core Team R. A Language and Environment for Statistical Computing.
[cited;
Available from: http://www.R-project.org
242
243
244
12.
Thorvaldsdottir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV):
245
high-performance genomics data visualization and exploration. Briefings in
246
bioinformatics 2012 Apr 19.
247
248
13.
Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, et al. Circos:
249
an information aesthetic for comparative genomics. Genome research 2009 Sep;
250
19(9): 1639-1645.
251
252
253
254
255
256
Tables
257
258
patient
no./ sex
1
female
2
female
FAB
3
male
4
female
5
female
6
male
MLL
M4
34%
M5
68%
M4
71,5%
76,50%
M5
51%
M5
88,5%
karyotyping
46,XX,der(10)t(10;11)(p12;q23)inv(11q13q23),
der(11)t(10;11)(p12;q13)
43~45,XX,der(10)?t(10;11)(p11;q23),
der(11)?t(10;11)(p11;q23),2mar,inc[cp2]/46,XX[
7]
46,XY,ins(10;11)(p11;q23q12)
45,XX,-13,der(17)t(13;17)(q31;p13).ish
ins(10;11)(p12;q23q23)(5’MLL+;3’MLL+)
46,XX,t(10;11)(p12;q23)inv(11)(q1?4q23)
46,XY,der(10)t(10;11)(p12;q23)inv(11)(q13q23),
der(11)t(10;11)(p12;q13),-17,+mar[9]/45, idem,Y[2]/46,XY[3]
age at
diagnosis
13 mo.
age at
relapse
17 mo.
18 yrs.
10½ yrs.
5¼ yrs.
5¾ yrs.
9 mo.
24 mo.
8 yrs.
Table S1. Patient characteristics
259
260
Patient
261
262
263
264
Platform
Initial
Remission
Relapse
lanes
SC
FC
lanes
SC
FC
lanes
SC
FC
3
85%
3x
91%
11,6x
1
GAIIx
6
84%
2,8x
90%
9,6x
2
67%
1,5x
89%
5,5x
2
GAIIx
3
75%
1,8x
90%
7,8x
2
57%
1x
88%
4x
3
GAIIx
3
75%
1,8x
92%
7,8x
2
63%
1,2x
90%
5,1x
4
HiSeq2000
1
91%
5,6x
92%
18,2x
2*
91%
9,52x
92%
30x
2*
91%
8,6X
92%
28x
5
HiSeq2000
1
91%
9,6x
92%
26,9x
1
91%
8,5x
92%
26x
1
91%
9,2x
92%
27x
6
HiSeq2000
1
87%
3x
92%
9,1x
1
92%
6,8x
92%
21,5x
Table S2. Sequencing results: SC = sequence coverage, FC = fragment coverage, *TruSeq v3 used for
library preparation, TruSeq v4 for other samples patient 4 to 6.
265
Figure legends
266
267
268
269
270
271
272
Figure S1. IGV displays paired-end reads aligned to the reference genome. The light gray bars indicate
reads, in which both mates (~ 450bp distance from each other) can be aligned perfectly to the reference
genome (depicted in the top bar). Colored reads indicate structural variants, their mate pairs appear in the
same color. In case of corresponding mates on different chromosomes a translocation is indicated. If reads
span a region on the same chromosome larger or smaller than the 450bp fragment length an insertion or
deletion, respectively, is revealed. Paired-end reads are supposed to be orientated towards each other. In
case alignment depicts reads in the same orientation, an inversion is indicated.
273
274
275
276
277
278
279
280
281
282
283
284
285
Figure S2. patient 1 - A) FISH analysis: Green color of the MLL probe identifies the proximal 5’ part of the
MLL gene, red color identifies the distal 3’part of the gene. The MLL probe shows a MLL split signal (both
green and red) on 10p. B) Karyotyping results C) Rearrangement profile: Molecular pattern of
rearrangement, revealed by paired-end sequencing. The illustration consists of a normal reference genome
in the upper region with found paired-end reads (e.g. m1 and m1a) aligned to the genome. As each read
(e.g. m1) is supposed to be orientated to its mate (e.g. m1a) the type of rearrangement can be deduced; e.g.
m4a and m4 on the same strand suggest an inversion, m1 and m1a on different chromosomes suggest a
translocation D) Schematic overview of the translocation harboring the MLL/MLLT10 fusion gene. E)
CIRCOS plot of the initial sample: Genomic landscape of interchromosomal translocations were scattered
across the whole genome and allocated along the outer ring (chromosome ideograms). The inner ring
represents copy-number status in terms of gains and losses. F) CNV plot: Deep blue colored bars indicate
the copy number determined by sequencing data in relation to the reference genome. In case of gains and
losses bars are elevated or lowered.
286
287
288
289
Figure S3. patient 2 - A) Karyotyping results B) Rearrangement profile: Molecular pattern of rearrangement
C) Schematic overview of the translocation harboring the MLL/MLLT10 fusion gene D) CIRCOS plot of the
initial sample E) CNV plot: We detected gains in chromosome 1, 13 and 21 with 3 copies instead of 2 in the
depicted regions. In chromosome 12 we see a loss.
290
291
292
293
294
Figure S4. patient 3 - A) FISH analysis: The MLL probe shows a MLL split signal on different chromosomes,
on 10p and 11q respectively, indicating an insertion B) Karyotyping results C) Molecular pattern of
rearrangement D) Schematic overview of the translocation harboring the MLL/MLLT10 fusion gene. The
fusion sequence was further validated by capillary sequencing revealing the breakpoint at 1-bp level. E)
CIRCOS plot of the initial sample F) CNV plot.
295
296
297
298
Figure S5. patient 5 - A) FISH analysis: The MLL probe shows a MLL split signal (both green and red) on
10p B) Karyotyping results C) Molecular pattern of rearrangement D) Schematic overview of the
translocation harboring the MLL/MLLT10 fusion gene. The fusion sequence was further validated by capillary
sequencing revealing the breakpoint at 1-bp level. E) CIRCOS plot of the initial sample F) CNV plot.
299
300
Figure S6. Location of MLL, MLLT10, RNF169 and RNF214 on chromosome 10 and 11 and their
corresponding distances.
301
Download