bcj201346x1

advertisement
Supplementary Materials and Methods and Supplementary Tables
Table of Contents:
A. Supplementary Materials and Methods (pages from 1 to 3)
B. Supplementary Tables (four excel files, legends at page 4)
C. References (page 5)
A. Supplementary Materials and Methods
Whole-Exome sequencing and identification of somatic mutations
Human samples were collected at the University of Rome “Tor Vergata” University of
Bologna and University of Torino (Italy), and genomic DNA was isolated using standard
protocols. In order to identify tumor-specific mutations, we compared each leukemic
sample to the corresponding normal DNA, isolated at the time of clinical and molecular
remission of the disease. For the murine samples, genomic DNA was isolated from the
leukemic spleen of PML-RARA KI mice or from the skin of the same animal, used as
matched normal DNA, using the Genomic DNA isolation kit (QIAGEN) according to the
manufacturer’s instructions. Exome-capture was performed using the SureSelectXT
Human All Exon v.1 for hAMLs, Human All Exon v.2 kit for hAPLs and SureSelectXT
Mouse All Exon kit (Agilent Technologies) for mouse samples, following the
manufacturer’s specifications.
Whole-exome sequencing was performed with the Illumina GAIIx with 76 bp paired-end
reads for 8 samples (hAML#Mi1, hAML#Mi2, hAML#Mi3, hAPL#Mi1, hAPL#Mi2,
hAPL#Mi3, hAPL#Mi4, hAPL#Mi5) and with Illumina Hiseq 2000 platform with 101 bp
paired-end reads for the remaining 18 samples (including mouse samples). Alignment to
the reference genomes (hg19 for human and mm9 for mouse) was performed using
Burrows-Wheeler Aligner (BWA)1. After Next-Generation Sequencing data preprocessing (local realignment, duplicate marking and base quality recalibration) using
GATK2, we obtained a haploid mean coverage of 60X for Illumina GAIIx and 160X for
Illumina Hiseq 2000. We identified single nucleotide variants (SNVs) and small
insertions/deletions (indels) in our samples using MuTect3 and Somatic Indels Detector
(present in GATK), respectively. We applied to the resulting variants the following
additional filters: i) the minimum read depth=10 for both normal and tumor samples; ii)
the minimum number of alternative reads=7; iii) at least 25% of all reads covering the
1
position should have the variant allele. The identified variants were functionally
annotated using ANNOVAR4. We excluded from further analysis variants in non-coding
regions, synonymous variants and variants present in highly repetitive regions. Validation
of the identified mutations was performed by PCR amplification, by designing specific
primer pairs for each mutation (sequences available upon request), followed by Sanger
sequencing. All PCR products were evaluated on a 2% agarose gel, sequenced in both
directions using Big Dye Terminator reactions and loaded on an ABI PRISM 3730xl
DNA analyzer. Sequences were analyzed using the Sequencing Analysis 5.2 software.
Published variants
In order to generate a dataset of mutations with all the available AML and APL samples,
we combined the results of our sequencing analyses with previously published data. From
the published variants, we excluded from further analysis the variants with missing
Entrez Gene Id, the ones present in mitochondria, in pseudogenes or in intronic region, as
well as all the mutations classified as “silent” or “rna”. Concerning the genes found
mutated in mouse genomes, we included in the analysis the human orthologous.
Significantly-mutated gene analysis
The massive sequencing of 247 AML genomes led to the identification of a total of 1559
genes targeted by mutations. However, only few mutations are supposed to be the leading
cause of clonal expansion (drivers), while the majority of them appear to be passengers,
having no role in the development of cancer. Here we propose a statistical strategy to
identify putative driver genes: genes are defined as significantly-mutated if they are
recurrently mutated in AMLs (i.e. present in at least two patients) and with a frequency of
mutation higher than expected by chance. In details, for each gene, we measured the
mutation rate per base: calculated as the ratio between the number of mutations per gene
and the length of the coding sequence of the gene. The length of each gene was
calculated as the maximum length between all non-overlapping transcript isoforms. We
compared this value to the average mutation rate per base identified in AML exomes (as
the ratio of the average number of mutations per sample, multiplied by the number of
2
samples, and the dimension of the exome), using one tailed Poisson test. We used the
Benjamini–Hochberg procedure to correct for False Discovery Rate (FDR).
Analyzing all the AML samples together, we found 191 genes recurrently targeted by
mutations but only 31 of them were significantly mutated in accordance to our statistical
strategy (q-value<0.005, Supplementary Table 4). The validity of our statistical approach
is illustrated, for example, by Titin (TTN). TTN has the longest coding sequence in the
genome and is found recurrently mutated in AMLs. However, it is not identified as
significantly-mutated by our statistical analysis as the analysis accounts for the long
target size of this gene.
Since initiating mutations of AMLs are diverse, considering only the significantlymutated genes in all cases together, we might miss some recurrently mutated genes that
play an important role in the pathogenesis of each cytogenetic subgroup. In order to
identify specific cooperative mutations, we searched for genes with significantly higher
mutation rate than expected by chance in each subgroup, according to the same statistical
approach described above (for a complete list refer to Supplementary Table 3).
Specific mutated genes are significantly associated with particular subgroups
While some frequently mutated genes, such as WT1, KRAS, NRAS, IDH2, and SMC3,
are common to many AML subgroups, other genes are significantly associated with
particular karyotypes (Table 2). To understand if a gene is significantly associated to a
specific category, we used a Benjamini-Hochberg-adjusted Fisher’s one-tailed exact test.
3
B. Supplementary Tables
Supplementary Table 1. Characteristics of the samples analyzed and genes affected
by somatic mutations identified by whole-exome sequencing.
Abbreviations: APL, acute promyelocytic leukemia; AML, acute myeloid leukemia;
h, human; m, murine; NK, normal karyotype; wt, wild-type; mut, mutant; bcr, breakpoint
cluster region of the PML/RARA oncogene; FAB, French-American-British
classification; Dx, diagnosis.
Supplementary Table 2. List of somatic mutations identified by our analysis
1
Function: exonic or splicing.
Exonic Function: whether it is a synonymous SNV, stopgain, non-/frameshift
substitution.
3
ESP5400_ALL: Allele frequency in 5400 NHLBI-ESP exomes.
4
1000g2012feb_ALL: 1000genome allele frequencies (February 2012 release).
5
dbSNP135: dbSNP reference number.
6
AVSIFT prediction scores.
7
COSMIC site: presence in the COSMIC database.
8
Freq: ratio between the number of alternative reads covering the mutation and its total
coverage.
9
Validated: mutation validated by Sanger sequencing.
2
Supplementary Table 3. Significantly-mutated genes associated to specific AML
subgroups.
1
Number of mutations identified for a specific gene in the indicated AML subgroup.
Statistical significance of the association of a specific mutated gene to the indicated
AML subgroup.
3
Mutated genes exclusively associated to a specific AML subgroup.
2
Supplementary Table 4. Significantly-mutated genes associated to all AML samples.
1
Total number of mutations identified for each mutated gene.
Significantly-mutated genes (q-value≤0.005)
4
C. References
1.
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler
transform. Bioinformatics 26, 589–595 (2010).
2.
McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A.,
et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing nextgeneration DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
3.
Cibulskis, K. K., Lawrence, M. S. M., Carter, S. L. S., Sivachenko, A. A., Jaffe, D.
D., Sougnez, C. C., et al. Sensitive detection of somatic point mutations in impure
and heterogeneous cancer samples. Nat Biotechnol 31, 213–219 (2013).
4.
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic
variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
5
Download