Introduction

advertisement
Supporting Information
Quality assessment of sequence capture and library preparation ................................... 2
Estimation of sequence capture enrichment ................................................................... 6
Information on capture array design targeting MLL ......................................................... 8
Information on capture array design targeting RUNX1 .................................................... 9
Information on capture array design targeting PDGFRB ............................................... 10
Linker sequence and MID barcodes .............................................................................. 11
Detection of an insertion................................................................................................ 12
Detection of a deletion................................................................................................... 13
Detection of a point mutation ......................................................................................... 14
Detection of molecular mutations in 95 gene targets .................................................... 15
Summary statistics of fusion genes and dominant clusters ........................................... 16
Validation of unknown fusion events detected by NGS ................................................. 20
References .................................................................................................................... 21
1
Quality assessment of sequence capture and library preparation
As demonstrated in Supplementary Figure 1 processing steps from initial genomic DNA
fragmentation through polishing, adapter ligation, and clean-up were assessed by
Agilent Bioanalyzer chip profiles.
after nebulization
Supplementary Fig. 1a
Nebulization of genomic DNA. 20 µg of genomic DNA were
fragmented in a nebulizer apparatus using 45psi of nitrogen gas for 60 seconds. The
Agilent Bioanalyzer 2100 DNA Chip 7500 (Agilent Bioanalyzer 2100, Böblingen,
Germany) depicts a representative molecular population of the generated DNA
fragments. The size distribution in base pairs (bp) is given on the x-axis.
2
Library Post-AMPure Beads
Nebulized Library after Polishing
Library Post-Ligation with gSel Linkers
Supplementary Fig. 1b
Polishing, adaptor ligation and clean-up steps. A Bioanalyzer
assessment is performed to verify the efficacy of the linker reaction. The Agilent
Bioanalyzer 2100 DNA Chip 7500 depicts a representative sample including three
curves: (i) nebulized library after polishing (red line); (ii) post-ligation library with gSel
linkers and a dominant linker dimer peak at app. 40 bp (blue line), and (iii) purified and
ligated DNA fragments with the majority of gSel linker dimers removed (green line).
3
size-selected DNA for library prep
Supplementary Fig. 1c
Size-selection process of the ligated DNA library. A
Bioanalyzer assessment is performed to monitor a profile of the ligated library after sizeselection using the double solid phase reversible immobilization (SPRI) paramagnetic
bead-based technology method (Beckman Coulter, Krefeld, Germany). The sizeselection step ideally enriches for DNA fragments between 500 and 800 bp, with less
than 10% <350 bp and less than 10% >1000 bp (Agilent Bioanalyzer 2100 DNA Chip
7500).
4
final library stock
Supplementary Fig. 1d
Final library stock population. Agilent Bioanalyzer RNA Pico
6000 LabChip represents a profile of a final library stock.
5
Estimation of sequence capture enrichment
Four distinct regions were amplified using the following primers as given in
Supplementary Table 1. If the qPCR analysis using NSC assays indicates a successful
capture of the control loci, it is likely that the experimental loci of interest targeted on the
microarray were also successfully captured and enriched for sequencing.
Supplementary Table S1 Sequence capture enrichment amplicons for qPCR
Amplicon
Primer sequences
5' - CGC ATT CCT CAT CCC AGT ATG - 3' (forward)
NSC-0237
5' - AAA GGA CTT GGT GCA GAG TTC AG - 3' (reverse)
5' - CCC ACC GCC TTC GAC AT - 3' (forward)
NSC-0247
5' - CCT GCT TAC TGT GGG CTC TTG - 3' (reverse)
5' - CTC GCT TAA CCA GAC TCA TCT ACT GT - 3' (forward)
NSC-0268
5' - ACT TGG CTC AGC TGT ATG AAG GT - 3' (reverse)
5' - CAG CCC CAG CTC AGG TAC AG - 3' (forward)
NSC-0272
5' - ATG ATG CGA GTG CTG ATG ATG - 3' (reverse)
6
case #24
assay: NSC-0247
∆CP: 8.99
enrichment: 197.2-fold
case #24
assay: NSC-0237
∆CP: 10.01
enrichment: 447.5-fold
captured
non captured
case #24
assay: NSC-0268
∆CP: 9.78
enrichment: 281.3-fold
captured
captured
non captured
case #24
assay: NSC-0272
∆CP: 10.21
enrichment: 823.3-fold
non captured
captured
non captured
Supplementary Figure 2 Calculation of sequence capture enrichment using qPCR. As
exemplarily demonstrated, the crossing point (CP) values, assessed in triplicates each,
from four qPCR assays of captured LM-PCR templates were significantly lower than CP
values from non-captured templates. The differences in CP values are used to calculate
the enrichment efficiency. The median result of the four distinct assays gives the
estimated final enrichment factor per patient sample.
7
Information on capture array design targeting MLL
As shown in Supplementary Figure 3, a customized capture array was targeting a
contiguous region of chromosome 11q23 where the MLL gene is located (start:
117,812,370; end: 117,901,177; hg18 assembly).
capture probes
MLL
Supplementary Figure 3 Custom NimbleGen 385K microarray to capture MLL
sequences. In total 88,732 bases were targeted by capture probes. Hybridization
probes covered the contiguous genomic target region with 92.1% (81,694 bases). 7.9%
of bases were not covered by probes due to repetitive sequences, as indicated in the
browser line “tiled_region”.
8
Information on capture array design targeting RUNX1
As shown in Supplementary Figure 4, a customized capture array was targeting a
contiguous region of chromosome 21q22.3 where the RUNX1 gene is located (start:
36,160,052; end: 36,421,677; hg19 assembly).
capture probes
RUNX1
Supplementary Figure 4 Custom NimbleGen 385K microarray to capture RUNX1
sequences. In total 261,544 bases were targeted by capture probes. Hybridization
probes covered the contiguous genomic target region with 97.6% (255,202 bases).
2.4% of bases were not covered by probes due to repetitive sequences, as indicated in
the browser line “tiled_region”.
9
Information on capture array design targeting PDGFRB
As shown in Supplementary Figure 5, a customized capture array was targeting a
contiguous region of chromosome 5q33.1 where the PDGFRB gene is located (start:
149,493,355; end: 149,535,460; hg19 assembly).
capture probes
PDGFRB
Supplementary Figure 5 Custom NimbleGen 385K microarray to capture PDGFRB
sequences. In total 42,023 bases were targeted by capture probes. Hybridization
probes covered the contiguous genomic target region with 98.9% (41,545 bases). 1.1%
of bases were not covered by probes due to repetitive sequences, as indicated in the
browser line “tiled_region”.
10
Linker sequence and MID barcodes
Linker molecules were ligated to the DNA fragments in the patient-specific libraries to
provide a priming site for post-enrichment amplification of the eluted pool of captured
DNA molecules. Multiplex identifier (MID) sequences, i.e. a molecular barcode of 10
base length, were introduced into each patient’s genomic sample by ligation, and
allowed a multiplexing design of the shotgun sequencing assay.
Supplementary Table 2 Linker sequences and molecular barcodes
Linkers
gSel3
gSel4-Pi
5'-CTC GAG AAT TCT GGA TCC TC-3'
5'-Phos/GAG GAT CCA GAA TTC TCG AGT T-3'
10-base Multiplex Identifier Sequences (MIDs)
MID-1
ACGAGTGCGT
MID-2
ACGCTCGACA
MID-3
AGACGCACTC
11
Detection of an insertion
As shown in Supplementary Figure 6a (case N06), a small insertion was detected in the
FLT3 gene. The 63 bp insertion was located in the juxtamembrane domain known to be
frequently mutated in AML cases with a normal karyotype1.
FLT3
insertion
Supplementary Figure 6a
Molecular insertion detected in the FLT3 gene. At
position 27,506,303, a 63 bp insertion was detected.
12
Detection of a deletion
As shown in Supplementary Figure 6b (case N01), a 3 bp deletion was detected in the
KIT gene in codon D4192.
KIT
deletion
Supplementary Figure 6b
Molecular deletion detected in the KIT gene. At
position 55,284,794 on chromosome 4, a 3 bp deletion was detected.
13
Detection of a point mutation
As shown in Supplementary Figure 6c (case N04), a point mutation was detected in the
KRAS gene3.
KRAS
point mutation CA
Supplementary Figure 6c
Point mutation detected in the KRAS gene. At position
25,289,552 on chromosome 12, a substitution of CA was detected resulting in an
amino acid change from G to C (codon 12).
14
Detection of molecular mutations in 95 gene targets
In addition to the proof-of-concept analysis for well-established typical AML mutations
that were known for some of the cases analyzed with the 1.9 Mb capture array, a more
global analysis was directed to investigating molecular mutations in all 95 gene targets.
In Supplementary Table 3, for each case the number of intronic and exonic variants,
also according to translational status is given.
Supplementary Table S3 Summary of variants detected in 95 genes (1.9 Mb array)
N01
N03
N04
N05
N06
median no.
1984
1745
1500
1701
944
1701
Ensembl variation database
1730
1524
1285
1508
871
Unknown variants
254
221
215
193
73
1672
1492
1223
1460
815
Ensembl variation database
1478
1303
1054
1293
749
Unknown variants
194
189
169
167
66
310
247
273
239
129
Ensembl variation database
251
217
229
214
122
Unknown variants
59
30
44
25
7
Deletions
15
5
12
4
0
Insertions
10
5
9
5
3
Substitutions
34
20
23
16
4
Not translated
37
18
28
13
3
Translated
22
12
16
12
4
Synonymous
11
6
8
6
3
Nonsynonymous
11
6
8
6
1
Variants
Intronic variants
Exonic variants
1460
247
6
15
Summary statistics of fusion genes and dominant clusters
As shown in Supplementary Spreadsheet 3, the frequency table depicts the cluster size
distribution for each fusion and the total number of clusters obtained for a given sample
and capture array design. Next, the dominant cluster is further annotated and details are
given according to the chromosomal breakpoints and reads per strand orientation.
Importantly, many chimeric reads were artificial since they were introduced
during sample preparation, but these reads were usually either removed during filtering
(see Supplementary Spreadsheet 2) or formed singular clusters of size n=1. Instead, it
was observed that in the majority of cases, a dominant cluster was detected by the
objective statistics pipeline. Frequently, clusters of two or more recurrent chimeric reads
and, in particular, reads covering both forward and reverse strands of a breakpoint
region were of interest.
In detail, after applying the filtering steps, case N01 (inv(16)(p13q22)
characteristics) was harboring a total of 481 unique chimeric reads. Of these, 465 reads
formed clusters of size n=1, 2 chimeric reads each formed clusters of size n=2 and a set
of 12 reads formed the dominant cluster, respectively. All chimeric reads in this
dominant cluster mapped to MYH11 and CBFB genes. As demonstrated in
Supplemental Figure 7 two constellations of fusion events were detectable, i.e. 4 distinct
reads were corresponding to the MYH11-CBFB fusion (Supplemental Figure 7a) and 8
additional
chimeric
reads
were
corresponding
to
the
CBFB-MYH11
fusion
(Supplemental Figure 7b), respectively.
16
a
b
Supplementary Fig. 7
Chimeric read distribution for case N01. a) In total, four
chimeric reads were detectable covering the CBFB-MYH11 fusion. The chimeric reads
are distributed according to the strand information and 5` → 3` orientation. b) In total,
eight chimeric reads were detectable covering the MYH11-CBFB fusion. The chimeric
reads are distributed according to the strand information and 5` → 3` orientation.
17
With respect to the other cases, the following observations were made and are
summarized in Supplementary Table 4 (Supplemental Spreadsheet 3):
Supplementary Table S4 Chimeric read interpretation and fusion genes
Case
N03
N04
N05
N14
N16
N17
N20
N21
N38
Interpretation
The distinct dominant cluster contained 7 chimeric reads and confirmed
the MLL-MLLT3 fusion known from routine operations.
The distinct dominant cluster contained 8 chimeric reads and confirmed
the RUNX1-RUNX1T1 fusion known from routine operations.
The largest cluster (3 reads) contained the MLL-ELL fusion.
Subsequent SNP microarray analysis identified the reciprocal SFRS14MLL fusion event, that was covered by 2 chimeric reads in the second
largest cluster.
The distinct dominant cluster contained 9 chimeric reads and identified
the MLL-MLLT10 fusion, subsequently also validated by a
corresponding PCR assay.
The distinct dominant cluster contained 8 chimeric reads and identified
the MLL-MLLT6 fusion, subsequently also validated by a corresponding
PCR assay.
In this interesting case, the largest dominant cluster contained 34 reads,
all matching to sequences on chromosome 11. The second largest
cluster contained 12 chimeric sequences, all matching to chromosomes
11 and 10, leading to a 5`-prime MLL-MLLT10 fusion. This complex
rearrangement further led to the reciprocal fusion of the 3`-prime MLL
gene to non-coding sequences on chromosome 11, thus explaining the
dominant cluster. The MLL-MLLT10 fusion subsequently also was
validated by a corresponding PCR assay.
The distinct dominant cluster contained 11 chimeric reads and identified
the MLL-MLLT1 fusion, subsequently also validated by a corresponding
PCR assay.
In this case only a clustering with size n=1 was observed. In total, 21
chimeric reads were detected. Amongst those 21 reads, 2 chimeric
reads were located to chromosomes 11q23 and chromosomes 4 (which
would fit with data from the karyotype), with one of these 2 reads
identifying a MLL-AFF1 fusion. This fusion was subsequently also
validated by PCR.
In this case, routine testing had detected a MLL-MLLT10 fusion. Yet,
the experiment failed to generate a distinct dominant cluster. Instead,
the cluster with size n=2 was mapping to chromosomes 7 and 11q23.
18
N39
N40
N41
N42
N27
N28
N29
N30
N33
N36
N37
The MLL-PFTK1 fusion, however, was not confirmed by subsequent
assays, indicating a failure of the capturing assay to identify the fusion.
In this case, the dominant cluster contained two chimeric reads mapping
to both chromosomes 6 and 11q23. However, the MLL-MLLT4 fusion
was not identified by those 2 sequences. Instead, one chimeric read
from the second largest cluster then contained the molecular chimeric
reads explaining the MLL-MLLT4 fusion, which was already known from
routine testing.
The distinct dominant cluster contained 5 chimeric reads and confirmed
the MLL-AFF1 fusion, known from routine operations.
The distinct dominant cluster contained 5 chimeric reads and confirmed
the MLL-ELL fusion, known from routine operations.
The dominant cluster contained 2 chimeric reads for both possible
constellations and confirmed the MLL-MLLT1 fusion, known from
routine operations.
The distinct dominant cluster contained 3 chimeric reads and identified a
fusion between RUNX1 and sequences on chromosome 17. This fusion
was subsequently validated by PCR.
The dominant cluster contained 2 chimeric reads and identified a
RUNX1-KCNMA1 fusion. This fusion was subsequently validated by
PCR.
The distinct dominant cluster contained 4 chimeric reads and identified a
fusion between RUNX1 and sequences on chromosome 5. This fusion
was subsequently validated by PCR.
The distinct dominant cluster contained 5 chimeric reads and identified a
fusion between RUNX1 and sequences on chromosome 10. This fusion
was subsequently validated by PCR.
This case harbored a t(12;21)(p13;q22) and was known to be positive
for the ETV6-RUNX1 fusion. One corresponding chimeric read was
contained in the cluster size n=1. Of note, the larger cluster with size
n=2 did not correspond to chimeric reads identifying the ETV6-RUNX1
rearrangement. Overall, this specimen was sequenced several times,
but the capturing assay only yielded 4 chimeric reads in total, explaining
the poor experimental performance of the samples, possibly due to
severe degradation of the input genomic DNA.
The distinct dominant cluster contained 9 chimeric reads and identified
the PDGFRB-DTD1 fusion, discovered and validated in parallel by
Erben P et al., Blood (ASH Annual Meeting Abstracts) 2008 112:
Abstract 3719.
The distinct dominant cluster contained 5 chimeric reads and identified
the PDGFRB-DTD1 fusion, discovered, validated, and published in
parallel by Walz C et al., Genes Chromosomes Cancer. 2009
Feb;48(2):179-83.
19
Validation of unknown fusion events detected by NGS
As shown in Supplementary Table 5, for 9 patients unknown fusion genes as detected
by capturing MLL (n=6 fusions) or RUNX1 (5 fusions) were subsequently validated and
confirmed by PCR assays using the following primer pairs.
Supplementary Table S5 PCR Primer information for validation of fusion genes
Patient
N05
Fusion
Forward primer (5’ – 3’)
Reverse primer (5’ – 3’)
MLL-SFRS14
GTCACCCCCGAGAGGACA
CCTTCCACAAACGTGACAGA
MLL-ELL
AACCACTCCTAGTGAGCCCAAG AAGGAGGCTGCCAGTGCT
N14
MLL-MLLT10
AACCACTCCTAGTGAGCCCAAG AGAGCGCTCCTACTTGTTGC
N16
MLL-MLLT6
AACCACTCCTAGTGAGCCCAAG AGCTGCTCCATGTTAGTCGTC
N17
MLL-MLLT10
MLL-MLLT1
RUNX1- KCNMA1
RUNX1-chr.17
chr.5-RUNX1
chr.10-RUNX1
CCAGTGAAAAGAAAGACAGCA
TTCCTGGCTTCCAGACTCTC
GAATGCAGGCACTTTGAACA
CCTCGCCTGACGAAGAGTC
AGCATGGTGGAGGTGCTG
GGTCATGGCAGCTTCACTTC
TTCAACTCCTGGACCAAACC
ATTCCATGCTCCCAATTTGA
CTCACAACAAGCTCCCATCA
TGAATCTGGTAGCCCATCCT
GCTTCAGGGCTCTCCTCAG
ACTATTCCAGCGGGGTAGC
RUNX1-chr.10
GCAGCCCTTTGATTTCACTC
TGTTCCCTCCAAGGAGACTG
N20
N28
N27
N29
N30
20
References
1
Schnittger S, Schoch C, Dugas M, Kern W, Staib P, Wuchter C, Loffler H,
Sauerland CM, Serve H, Buchner T, Haferlach T, Hiddemann W. Analysis of FLT3
length mutations in 1003 patients with acute myeloid leukemia: correlation to
cytogenetics, FAB subtype, and prognosis in the AMLCG study and usefulness as a
marker for the detection of minimal residual disease. Blood 2002; 100: 59-66.
2
Kohl TM, Schnittger S, Ellwart JW, Hiddemann W, Spiekermann K. KIT exon 8
mutations associated with core-binding factor (CBF)-acute myeloid leukemia (AML)
cause hyperactivation of the receptor in response to stem cell factor. Blood 2005; 105:
3319-3321.
3
Bowen DT, Frew ME, Hills R, Gale RE, Wheatley K, Groves MJ, Langabeer SE,
Kottaridis PD, Moorman AV, Burnett AK, Linch DC. RAS mutation in acute myeloid
leukemia is associated with distinct cytogenetic subgroups but does not influence
outcome in patients younger than 60 years. Blood 2005; 106: 2113-2119.
21
Download