Appendix: Genetic Diversity of Newly Diagnosed Follicular

advertisement
Appendix:
Genetic Diversity of Newly Diagnosed Follicular Lymphoma
Asmann, YW et al
METHODS
Patients and tissue samples:
Patients with FL were diagnosed and classified according to World Health Organization criteria. The fresh
frozen tumors and paired peripheral blood (referred to as “normal”) samples were obtained from the Mayo
Lymphoma Molecular Epidemiology Resources (MER). The characteristics of the 8 FL patients are listed in
Table 1 in the main text. Exome sequencing of these 8 FL tumor-normal pairs were performed at the Mayo
Clinic Advanced Genomics Technology Center. The coding regions of the genome were captured using
SureSelect Target Enrichment System version 2.0 (Agilent; Santa Clara, CA) which targets ~36 Mb of coding
exons, and 100-bp paired-end sequencings were carried out on an Illumina HiSeq 2000 sequencer. The wholegenome mate-pair libraries of both normal and tumor samples with 5kb insert sizes were sequenced paired end
at 50-bp read length on the Illumina HiSeq 2000. For RNA sequencing of 8 FL tumors, total RNA was
extracted using Exiqon’s miRCURY RNA Isolation Kit, and mRNA was purified and the sequencing libraries
were constructed according to Illumina TruSeq protocol. The sizes of the library fragments selected for
sequencing were between 150-250 bp, and 50 bp paired-end sequencing was performed.
Exome Sequencing and Variant Calling:
The qualities of the raw sequence reads were checked by FastQC (http://seqanswers.com/wiki/FastQC), and the
paired-end reads were aligned to Human Reference Genome Build 37 using BWA [1] and local-realigned and
re-calibrated using GATK [2]. Using the TREAT analytic work flow [3], the single nucleotide variants (SNV)
and small insertions and deletions (INDEL) were called using SNVMix [4] and GATK, respectively. The
identified variants were annotated using both Seattle-Seq
(http://snp.gs.washington.edu/SeattleSeqAnnotation134/) and SIFT [5]. For the current study, we only included
the following variants in our analyses: (i) frame-shift and splice-site INDELs; (2) non-sense and splice-site
SNVs; and (3) non-synonymous or missense SNVs. The tumor-specific or somatic variants (INDELs and
SNVs) were identified as described below. First, we required minimum sequencing depth of 8 reads at each
variant site in both tumor and normal samples. The INDELs must be supported by at least 3 reads and present in
only tumor but not paired normal sample. The somatic SNVs were defined as Chi-Square test p value ≤ 0.01 at
variant site using read depth values of tumor and normal, and the numbers of reads supporting alternative alleles
in tumor and paired normal samples. In addition, we also filtered out polymorphic positions with variant allele
frequencies (VAF) > 0.01 in the 5500 subjects of the Exome Project (http://exome.gs.washington.edu/) or with
Miner Allele Frequency (MAF) > 0.05 in dbSNP version 134.
RT-PCR and Sanger Sequencing Validation of Somatic CRIPAK Mutations:
The frozen tumors of 20 FL and 31DLBCL were obtained from Mayo Specialized Program of Research
Excellence (SPORE) Molecular Epidemiology Resource. The total RNAs were extracted from the tissues using
the QIAGEN RNA Easy Mini Kit, and were reverse transcribed into cDNA. The exon (the only exon of 1341
nt) plus 250 bp of the 5’ UTR and 50 bp of the 3’-UTR regions of the CRIPAK gene were amplified using
polymerase chain reaction (PCR) (forward primer: 5’ GGGCATCTCGTTCCTCAGAT 3’; and reverse primer
5’ AGCACCAGGCTAACAAATCAGTCC 3’). The PCR amplified cDNA were sequenced using Sanger
technology.
Regulatory Network Analysis of Frequently Mutated Genes in NHL:
Regulatory network analysis of multiple genes was performed using the shortest path algorithm from MetaCore
(GeneGo Inc.) and the gene-gene relationships annotated in the MetaCore Knowledge Database (6.11 build
41105). The network statistics and the hub genes of the network was calculated and the hub genes are defined as
the network nodes with the number of connections (or edges) large than 25% of the total nodes in the network.
Protein-Protein Interaction Distance:
Protein-protein interaction (PPI) relationships are retrieved from Human protein reference database (HPRD) [6].
The pair-wise shortest distance in retrieved PPI network was defined as average of shortest distance between
two proteins in a given protein set G with N proteins, and was computed as: s 
2
 d PPI ( g1, g2 ) .
N ( N  1) g1 ,g2G
To construct a null model assessing how the actually computed statistics s differs from random cases, we
sampled 10,000 random protein sets {GnRand }n1,
{snRand }n1,
,1e 4
,1e 4
of N proteins and computed corresponding values of
, and compared the s of the actual G with the distribution of the s values from random samplings.
RNA-Seq Data Analysis and Fusion Transcript Detection:
The mRNA expression of the genes were calculated using HTSeq (http://seqanswers.com/wiki/HTSeq) after
BWA alignment of the paired-end reads to both reference genome (Build 37) and exon junctions. The fusion
transcripts and isoforms were identified using the SnowShoes-FTD algorithm [7]. We required that the two
fusion partner genes be on different chromosomes or at least 50,000 bp apart if on the same chromosome, and
that a fusion transcript is supported by at least 3 pairs of encompassing reads and 2 unique fusion junction
spanning reads. In addition, we allowed up to ten isoforms between two fusion partners.
Detection of Somatic Copy Number Variants in FL Tumor:
The exon level copy number variants (CNV) were detected in paired tumor-normal exome sequencing data
using the in-house developed algorithm, PatternCNV, which is based on the observation that in exome
sequencing data the distribution of the mapped reads, although not uniform among different exons within a
sample, are consistent for each exon across different samples when there is no CNV events in the region [8]. In
addition, the genome-level CNVs were identified from the paired tumor-normal mate-pair DNA sequencing
data using an extended version of PatternCNV.
Detection of Somatic Copy Neutral Structural Variants:
The large structural variants including the copy neutral structural variants (SV) such as translocations and
inversions were detected in paired tumor-normal whole genome mate-pair DNA sequencing data using an in-
house developed algorithm, SnowShoes-SV (Asmann, et al manuscript submitted), which is based on the disconcordant mapping of the read-pairs. The SnowShoes-SV is an exhaustive algorithm for SV detection and the
false positives were filtered out using both paired normal samples and a pool of Mayo Biobank control subjects,
as well as the alignment features of the potential SV regions.
RESULTS
Sequencing Statistics: The exomes of 8 tumor-normal pairs of FL samples were sequenced at depths of
107-164 million 100-bp paired-end reads per sample with ~45% of the reads on target, which led to ≥10-fold
coverage in 90% of the targeted regions in all samples. The tumor and paired normal mate-pair libraries were
sequenced at 127-206 million 50-bp paired end reads with 58-62% reads mapped to genome. The eight tumor
RNA samples were sequenced at depths of 136-186 million 50-bp paired-end reads per sample with 58-64%
reads mapped to known genes and exon junctions.
The Diversity of Mutational Landscape in FL: The 8 FL patients were clinically diverse (Table 1, main
text), including four grade 1-2 indolent tumors, classified as indolent tumors; and two grade 3A tumors plus two
grade 1-2 tumor subsequently transformed, classified as aggressive tumors. Two of the patients did not receive
initial treatments after diagnosis (observations only), and three patients are event-free after 46, 79, and 100
months while two patients had subsequent transformation of their tumors. Interestingly, the two grade III FL
tumors from patient #7 and #8, harbor the most genomic abnormality with: (i) highest number of genes with
point mutations (SNVs and short INDELs, Figure 1b middle panel, and Supplement File S1); (ii) highest
number of genes impacted by copy number aberrations (Figure 1b upper panel, and Supplement File S2); and
(iii) substantially higher number of large structural variants compared to the other six tumors (Figure 1b lower
panel, and Supplement Files S3, S7). The genomic diversity of these tumors appeared to parallel the clinical
diversity of the patients.
As shown in Figure 1c and Supplement Files S3 and S7, we identified several recurrent mutations including
the well characterized t(14;18) translocation in 1 of the 2 grade 3A tumors and 5 of 6 grade 1-2 patients. A
chr1q amplification was observed in 4 out of 8 samples (Figure 1a and 1c; and Supplement Files S3, S7). In
addition, recurrent point mutations were found in previously reported lymphoma genes. The histone
methyltransferase gene MLL2 gene was mutated in 3 out of 8 patients; and the histone acetylation gene
CREBBP was mutated in 2 out of 8 patients. The Histone cluster genes and HLA genes were also mutated in 2
and 3 out of 8 cases, respectively. In addition, we identified recurrent point mutations in a histone
methyltransferase gene (CRIPAK, cysteine-rich PAK1 inhibitor), and copy number deletions of a tumor
suppressor gene (DMBT1, deleted in malignant brain tumors 1) in 2 cases (Supplement File S2). The mutational
landscape of individual tumors will be discussed in detail below.
Patient Description and Observed Genomic Alterations:
Patient #1: this female patient was diagnosed at age 56 with a grade 1 stage III follicular lymphoma. She
has received no treatment before and after surgery/biopsy and has been treatment-free for 100 months. The
tumor from this patient had the t(14;18) translocation and had a frame-shifting short insertion mutation in
MLL2, a missense mutation in BCL2, a missense mutation in the histone H2A family gene HIST1H2AM, and
deletion of the tumor suppressor gene DMBT1. This tumor had a chr7 trisomy.
Patient #2: this male patient was diagnosed with grade 2 Stage III FL at a young age of 39. He was enrolled
in RESORT trial as initial treatment and later received maintenance R therapy. So far, the patient has been event
free for 79 months. This tumor carried the t(14;18) translocation. A nonsense mutation was observed in the
TNFRSF14 gene, as well as two frame-shifting INDELs in the HLA-B gene. This patient also had a frameshifting small deletion in the CRIPAK gene.
Patient #3: this male patient was diagnosed with a grade 2 and stage III FL at age 56. The patient was
placed under observation without treatment initially and went on to receive rituximab monotherapy 8 months
after diagnosis. The patient subsequently entered a vaccine trial and later was managed with rituximab
monotherapy. This tumor had the t(14;18) translocation. However, we did not observe mutations in known
lymphoma genes.
Patient #4: this male patient was diagnosed at age 55 with a grade 2, stage II FL and received R-CHOP as
initial treatment due to tumor related small bowl obstruction. The patient had an asymptomatic FL II relapse 68
months from diagnosis and subsequently enrolled in an Ibrutinib trial. The noticeable mutations in this tumor
were the t(14;18) translocation, the DMBT1 deletion, and MYC amplification. This tumor had a chr8 trisomy.
Patient #5: this female patient was diagnosed at age 52 with a grade 1 stage III tumor. She received the
initial R-CVP treatment which was ineffective. A re-biopsy after cycle one showed DLBCL and the patient was
subsequently treated with R-CHOP, then R-ICE and an autologous stem-cell transplant. This tumor had the
t(14;18) translocation as well as the chr1q amplification. We also identified a frame-shifting small deletion in
MLL2, a missense mutation in CREBBP, a nonsense mutation in the histone H2B family gene HIST1H2BD, and
one frame-shifting and two missense mutations in CRIPAK gene. In addition, a missense mutation was
observed in TBL1XR1 (Transducin Beta-Like 1 X-Linked Receptor 1) gene which has been reported to be
mutated in the primary central nervous system lymphoma [9] and was involved in the TBL1XR1/TP63 fusion in
FL, DLBCL, and T-cell lymphoma [10] [11]. This sample had a chr12 trisomy as well as the chr1q
amplification.
Patient #6: his male patient was diagnosed with grade 2stage III FL at age 66 and was initially treated with
CVP. The patient relapsed after 16 months. The patient subsequently had 6 additional regimens, including RCHOP and autologous stem cell transplant. In addition, the tumor transformed at relapse to FL 3A and later
FL3B. This is the only non-grade-III tumor without the t(14;18) translocation. However the tumor does have
the chr1q amplification in addition to chr18 and chr21 trisomy. We did not detect point mutations in genes
known to be mutated in lymphoma. Interestingly multiple fusion transcripts were identified in this tumor
(Supplement File S5, and Supplement File S6 Figure B): C17orf68  NXN, LOC100132273  CCDC117, and
TFG  GPR128. The NXN (nucleoredoxin) as a fusion partner gene is interesting since it is a redox-dependent
negative regulator of the Wnt signaling pathway [12]. The TFG  GPR128 fusion is a known germline fusion
previously detected in both lymphoma and healthy subjects [13], and our tumor-normal paired DNA mate-pair
sequencing data also support it as a germline DNA fusion. The TFG  GPR128 fusion is also the only
recurrent fusion after screening the public RNA-Seq data of 12 FL and 92 DLBCL tumors (dbGAP study
accession number: phs000235.v3.p1) [14, 15].
Patient #7: this male patient was diagnosed with grade 3a stage III/ FL at age 41 as was treated on the
lenalidomide/R-CHOP clinical trial. This patient remains event-free at 46 months. This is one of the two grade
III tumors profiled and had the second highest number of point mutations, CNA, and structural variants. It had
the t(14;18) translocation, the chr1q amplification, and chr13/chr15/chr17 deletions. There were frame-shifting
INDELs observed in CREBBP and HLA-DRB1 genes. This tumor also has a NOTCH2 gene deletion. We
detected two fusion transcripts from the transcriptome sequencing data (Supplement File S5, and Supplement
File S6 Figure B): AK7  CBL and VCPIP1  MYBL1. The partner genes involved in these two fusions are
intriguing. The CBL oncogene (Casitas B-lineage lymphoma proto-oncogene) is an E3 ubiquitin-protein ligase
which has been shown to induce mouse pre-B and pro-B cell lymphomas [16], and the MYBL1 gene (v-myb
myeloblastosis viral oncogene homolog (avian)-like 1) is a strong transcription activator and might have a role
in the proliferation and/or differentiation of B-lymphoid cells [17].
Patient #8: this male patient was diagnosed with grade 3A and stage III FL at age 54 and received RCHOP initially. The FL relapsed after 35 months and the patient subsequently was treated with R monotherapy.
This tumor does not harbor the common t(14;18) translocation but had instead large number of other structural
variants. The chr1q of this tumor was amplified, and the chr1p had both amplifications and deletions. The chr1
abnormality also resulted in high number of large SVs with and without inversions. Furthermore, the chr17 had
the p arm deletion and q arm amplification. In addition to the large number of structural variants, this stage III
tumor also had the highest number of both point mutations and CNAs among 8 FL tumors profiled. The
observed point mutations include a frame-shifting small deletion in MLL2, a missense mutation in TP53, frameshifting INDELs in both HLA-B and HLA-DRB1, and a nonsense mutation in FAS (Fas cell surface death
receptor) gene which was previously reported in lymphoma [18] [19]. It’s worth noting that one copy of the
TP53 in this tumor was deleted and the remaining copy had the missense mutation. We also detected an
expressed fusion gene (Supplement File S5, and Supplement File S6 Figure B), MAP4GNL3, in the RNA-Seq
data from this tumor. The fusion gene partner, GNL3 (guanine nucleotide binding protein-like 3) is known to
interact with TP53 and MDM2 [20] and may play an important role in tumorigenesis.
Identification of CRIPAK mutations in FL and DLBCL:
The mutation of CRIPAK, the cysteine-rich PAK1 inhibitor, has not been reported previously. Although it was
mutated in only 2 out of 8 FL tumors studied, this supports the important role of histone modification genes in
lymphoma tumorigenesis. CRIPAK is a negative regulator of PAK1, and the only known/reported connection of
CRIPAK with tumor is that the loss of CRIPAK in breast cancer has been suggested to contribute to hormonal
independence [21]. A bioinformatics analysis showed that the coding regions of CRIPAK are highly enriched
with the protein functional domain post-SET according to ENSEMBL
(http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000179979;r=4:13853401389780;t=ENST00000324803). The post-SET domain is usually found in a number of histone lysine
methyltransferases (HMTase), C-terminal to the SET domain which is a conserved 130 amino acid sequence
known to methylate histones on lysine [22]. We performed RT-PCR and cDNA Sanger sequencing and
observed CRIPAK mutations in 11 out of 20 (55%) FL and 12 out of 31 (38.7%) DLBCL tumors. Additional
validation was done by analyzing the publicly available RNA sequencing data of 12 FL and 71 DLBCL tumors
where CRIPAK mutations were identified in 3 FL (25%) and 17 DLBCL (23.9%) cases. Note that the
sequencing depth of CRIPAK in the RNA-Seq data varied and we only examined the regions with sufficient
number of read coverage.
The relationships between CRIPAK and other mutated genes in FL and DLBCL: Since genes with
related biological functions often interact with each other in the context of pathways and gene sets, we
hypothesized that CRIPAK is functionally close to the previously identified genes recurrently mutated in FL and
DLBCL. We collected a list of 29 genes from published literature (Supplement File S4, worksheet “Seed
Genes”), and performed regulatory network analyses of these genes using the shortest path algorithm available
in MetaCore (GeneGo Inc.). Note that we had to exclude one of the 29 genes, TMSL3, because it is a pseudogene and was not recognized by MetaCore. As shown in Figure 2b, there are 31 nodes included in the network,
and 24 out of 28 (86%) genes known to be mutated in FL and/or DLBCL are connected with each other around
8 hub genes: MYC (25 edges), TP53 (18 edges), EP300 (17 edges), BCL6 (14 edges), ESR1 (12 edges), EZH2
(11 edges), BCL2 (9 edges), and STAT6 (9 edges). In the context of this manuscript, we refer to these 31 node
genes as the “FL/DLBCL mutation network genes”. CRIPAK was placed in the network only one node away
from one of the hub genes ESR1 (estrogen receptor 1). Other genes that are connected to the ESR1 hub are
MLL2, EP300, CREBBP, SGK1, BCL2, MYC, EZH2, TP53, and PRDM1.
We tested the significance of the connectivity between these 31 network node genes using the PPI database.
The PPI network consists of 9,300 proteins with 35,000 documented interactions. We performed 10,000
simulations and each time 31 random proteins were selected to calculate the average pair-wise shortest distance
(the s value). As shown in Supplement File S6 Figure A, the average pair-wise distance of the 31 “FL/DLBCL
mutation network genes” is smaller than any single s values obtained from our random sampling (empirical p
value = 0 from 10,000 simulations). Therefore the proximity of the “FL/DLBCL mutation network genes” was
statistically significant.
REFERENCES
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
Li, H. and R. Durbin, Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics,
2009. 25(14): p. 1754-60.
McKenna, A., et al., The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA
sequencing data. Genome research, 2010. 20(9): p. 1297-303.
Asmann, Y.W., et al., TREAT: a bioinformatics tool for variant annotations and visualizations in targeted and
exome sequencing data. Bioinformatics, 2012. 28(2): p. 277-8.
Goya, R., et al., SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors.
Bioinformatics, 2010. 26(6): p. 730-6.
Kumar, P., S. Henikoff, and P.C. Ng, Predicting the effects of coding non-synonymous variants on protein function
using the SIFT algorithm. Nat Protoc, 2009. 4(7): p. 1073-81.
Keshava Prasad, T.S., et al., Human Protein Reference Database--2009 update. Nucleic acids research, 2009.
37(Database issue): p. D767-72.
Asmann, Y.W., et al., A novel bioinformatics pipeline for identification and characterization of fusion transcripts
in breast cancer and normal cell lines. Nucleic acids research, 2011. 39(15): p. e100.
Wang, C., et al., PatternCNV: a versatile tool for detecting copy number changes from exome sequencing data.
Bioinformatics, 2014.
Gonzalez-Aguilar, A., et al., Recurrent mutations of MYD88 and TBL1XR1 in primary central nervous system
lymphomas. Clinical cancer research : an official journal of the American Association for Cancer Research, 2012.
18(19): p. 5203-11.
Scott, D.W., et al., TBL1XR1/TP63: a novel recurrent gene fusion in B-cell non-Hodgkin lymphoma. Blood, 2012.
119(21): p. 4949-52.
Vasmatzis, G., et al., Genome-wide analysis reveals recurrent structural abnormalities of TP63 and other p53related genes in peripheral T-cell lymphomas. Blood, 2012. 120(11): p. 2280-9.
Funato, Y. and H. Miki, Redox regulation of Wnt signalling via nucleoredoxin. Free radical research, 2010. 44(4):
p. 379-88.
Chase, A., et al., TFG, a target of chromosome translocations in lymphoma and soft tissue tumors, fuses to
GPR128 in healthy individuals. Haematologica, 2010. 95(1): p. 20-6.
Morin, R.D., et al., Somatic mutations altering EZH2 (Tyr641) in follicular and diffuse large B-cell lymphomas of
germinal-center origin. Nature genetics, 2010. 42(2): p. 181-5.
Morin, R.D., et al., Frequent mutation of histone-modifying genes in non-Hodgkin lymphoma. Nature, 2011.
476(7360): p. 298-303.
Langdon, W.Y., et al., v-cbl, an oncogene from a dual-recombinant murine retrovirus that induces early B-lineage
lymphomas. Proceedings of the National Academy of Sciences of the United States of America, 1989. 86(4): p.
1168-72.
Trauth, K., et al., Mouse A-myb encodes a trans-activator and is expressed in mitotically active cells of the
developing central nervous system, adult testis and B lymphocytes. The EMBO journal, 1994. 13(24): p. 59946005.
Gronbaek, K., et al., Somatic Fas mutations in non-Hodgkin's lymphoma: association with extranodal disease and
autoimmunity. Blood, 1998. 92(9): p. 3018-24.
Takakuwa, T., et al., Frequent mutations of Fas gene in nasal NK/T cell lymphoma. Oncogene, 2002. 21(30): p.
4702-5.
Dai, M.S., X.X. Sun, and H. Lu, Aberrant expression of nucleostemin activates p53 and induces cell cycle arrest via
inhibition of MDM2. Molecular and cellular biology, 2008. 28(13): p. 4365-76.
Talukder, A.H., Q. Meng, and R. Kumar, CRIPak, a novel endogenous Pak1 inhibitor. Oncogene, 2006. 25(9): p.
1311-9.
Dillon, S.C., et al., The SET-domain protein superfamily: protein lysine methyltransferases. Genome biology, 2005.
6(8): p. 227.
Download