Exome sequencing analysis of the mutational spectrum in

advertisement
Exome sequencing analysis of the mutational spectrum in
carcinogen and genetic models of Kras-driven lung cancer
Peter Westcott, Kyle Halliwill, Minh To, David Quigley, Reyno
Delrosario, Erik Fredlund, David Adams1, and Allan Balmain
UCSF Helen Diller Family Comprehensive Cancer Center, 1450 3rd Street, San
Francisco.
1 Wellcome Trust Sanger Centre, Cambridge, England.
Why sequence tumors from mice?
Control!
 Timing of initiation  collection
 Initiating gene(s), carcinogen(s)
 Can distinguish mutations involved in initiation from
progression
Specific goals of this study
Part of the MMHCC TCGA Pilot Project
Characterize the utility of sequencing mouse tumors:
 What is the effect of the causative carcinogen on
mutation spectrum?
 Clean genetic induction (GEM) vs. carcinogen induction?
 What mutations arise after Kras initiation?
Exome sequencing
Urethane
44 lung tumors
from 17 mice
MNU
26 lung tumors
from 7 mice
Kras+/(FVB/Ola)
KrasLA2 (GEM)
13 lung tumors
from 4 mice
KrasLA2
(FVB/Ola)
Spontaneous lung tumors
Kras+/-
Kras+/+
Control tail DNA: 2 Kras+/+ tails
Exome sequencing
 Illumina paired-end sequencing (Wellcome Trust Sanger Centre)
 Have aligned reads to mouse genome, called against multiple
controls and performed extensive QC (Kyle Hallilwill)
 Have a confident list of somatic variants
Exome sequencing
Carcinogen models of Kras-driven lung cancer
Urethane (ethyl carbamate)
 Adenosine and cytidine DNA adducts lead to mispairing:
A
Replication
T
Mispairing
 Kras Q61L (CAACTA), Q61R (CAACGA).
 ~90% of lung tumors harbor Kras mutations.
Carcinogen models of Kras-driven lung cancer
MNU (methyl-nitroso urea)
 Guanosine DNA adducts lead to GA transitions
G
G
Replication
Mispairing
G
A
 Kras G12D (GGTGAT)
 ~90% of lung tumors harbor Kras mutations
Genome-wide spectrum of these carcinogen mutations not known
Mutation spectrum
Urethane
MNU
Light shade = Kras+/-
LA2
Mutation spectrum
Slight bias for mutations
at G/C nucleotide
Strong bias for mutations
at G nucleotide with
flanking G or A
Strong bias for mutations
at A/T nucleotide
Mutation spectrum
MNU G->A Mutations
Average counts per tumor
100
90
80
70
60
50
40
30
20
10
0
5’ A
 Purine bias at 5’ flanking base
5’ G
Mutation spectrum
 Are non-carcinogen mutations separable?
Average counts per tumor
670
Urethane
MNU
LA2
80
60
40
20
0
NCG->T
Other G->A
A->T
For the most part
A->G
A->C
G->C
G->T
ARE CARCINOGEN MUTATIONS RELEVANT?
Other driver mutations?
 Analysis complicated:
High mutation rates: MNU – 21.2/Mb
Urethane – 6.4/Mb
LA2 – 1.9/Mb
Correlation between gene length and mutations
 Start with variants within Vogelstein’s 2013 list of drivers:
Selected only consequential mutations at
highly conserved sites in expressed genes
Other driver mutations?
GENE
Mll2
Sf3b1
Crebbp
Asxl1
Pdgfra
Met
Cic
Atm
Arid1b
Alk
Gnas
Notch2
Arid1a
Fgfr3
Hnf1a
Flt3
Brca2
Akt1
Rb1
EXON_LENGTH
19827
6191
7507
6674
6553
6652
6099
11964
11325
5918
3717
10506
8175
4222
3186
3656
10540
2640
4625
NONSYN_MUT
16
5
4
3
3
3
3
3
3
3
2
2
2
2
2
2
2
2
2
None of these mutations
occur in LA2 tumors
Slight enrichment for longer genes
Modest increase in NS mutation ratio
One S367 to F – required for autophosph.
and activity
Subclonal Myc T58P?
Conclusions
Mutation Spectrum
 Clear recapitulation of expected carcinogen mutations
 GEM shows few mutations
 Mutations highly specific and distinguishable
Driver Mutations
 Kras
 Interesting candidates in carcinogen-induced tumors
Future work
 Validate top 1000 interesting variants by Sequenom
(Wellcome Trust Sanger Centre).
 Optimize list of potential driver mutations (relevant sites?).
 InDel analysis.
 Array CGH (copy number analysis). Inverse correlation of
point mutational burden and copy number changes?
Acknowledgments
Kyle Halliwill
Minh To
David Quigley
Reyno Del Rosario
Erik Fredlund
ALLAN BALMAIN
DAVID ADAMS (WELLCOME TRUST SANGER CENTRE)
$:
MMHCC
$:
NIH Training Grant T32 GM007175
$:
NSF
Supplemental (Kyle’s Pipeline)
• Capture using Agilent mouse whole exome kit
• Sequenced on illumina HiSeq
– Paired end, 75 bp each, average read span of 180 bp
• Converted back to FASTQ, then followed QC pipeline
(next slide)
Supplemental (Kyle’s Pipeline)
Align to Mm10 with BWA
QC and Variant
Calling Strategy
Mark duplicates and fix mate information
with picard
Base recalibration and realignment with
GATK
Alignment and coverage information with
picard
Variant calling with MuTect
Filter for depth and previously observed
variants with vcftools
Supplemental (Kyle’s Pipeline)
Variant Calling Details
Control
1 .bam
Variant
Calling via
MuTect
Variant
List1 .vcf
Intersect
Filter,
Annotate
Candidate
Variant List
.vcf
Sample
.bam
Variant
List2 .vcf
Control
2 .bam
Candidate
Variants
Download