Discovering conserved DNA

advertisement
Tumor Genome Sequencing
Xiaole Shirley Liu
STAT115, STAT215, BIO298, BIST520
Cancer
• Cancer will affect 1 in 2 men and 1 in 3 women in
the United States, and the number of new cases of
cancer is set to nearly double by the year 2050.
• Cancer is a genetic disease caused by mutations
in the DNA
• Clinically tumors can look the same but most
differ genetically.
Mutations in the Tumor Genome
• Help us identify important genes for
tumorigenesis and cancer progression
• Drivers – a.k.a gatekeepers, mutations that cause
and accelerate cancers
• Passengers – Accidental by-products and
thwarted DNA-repair mechanisms
• Recurrent mutations on genes or pathways are
likely drivers
High Throughput Driver Detection
• Differential gene expression
• Copy number aberration (CNA) or variation
(CNV) using CGH, tiling or SNP arrays
Comparative genomic hybridization (CGH)
GISTIC
• Gscore: frequency of occurrence and the amplitude of the
aberration
• Statistical significance evaluated by permutation
• FDR adjust for multiple hypothesis testing
Two Major Cancer Genome Projects
• TCGA: The Cancer Genome Atlas
–
–
–
–
US funded
~20 cancer types * a few hundred tumor samples each
Genome, transcriptome, DNA methylome, proteomics
Rigorous tumor sample QC, consistent profiling
platform
• ICGC: International Cancer Genome
Consortium
– 11 countries
– 20 cancer types * 500 tumor samples each
Different Sequencing Approaches
• Capture-seq ($400-600)
– Could focus well known mutations
• Exome-seq ($700-2K)
– All the exons in genes; promoters and LncRNA genes?
• RNA-seq ($500-2K)
– Expression and mutations together, miss anything?
• Whole genome sequencing ($3-4K)
– Majority of mutations non-coding, function unknown
– Better at detecting structural changes (translocations,
fusions)
– Cost-vs-benefit balance
MAF and VCF Formats
• VCF (GWAS format) and MAF (TCGA format)
• Both can annotate somatic mutations and germline
variants
• Tab delimited text file
• CHROM, POS, ID (SNP id, gene symbol, or ENTREZ
gene id), REF (reference seq), ALT (altered sequence),
QUAL (quality score), FILTER (PASS vs “q10;s50”
quality <=10, <=50% samples have data here), INFO
(allele counts, total counts, number of samples with data,
somatic or not, validated, etc)
GATK
• https://www.broadinstitute.org/gatk/guide/best-practices
FASTA-> BAM
BAM->VCF
Annotate
Example of a Cancer Genome
Mutations Profile
• Circos Plot: how messed up a cancer genome is
Total alterations affecting proteincoding genes in selected tumors
Vogelstein et al, Science 2013
Somatic Mutation Frequency
in 3K Tumor-Normal Pairs
• Typical tumors: median 45 mutations / tumor
• More mutations for tumors facing outside
Mutation Rate Heterogeneity
• Mutation rate correlated with replication timing,
gene expression, and gene length
• Tumor evolution and selection
TS vs Oncogenes, GoF vs LoF
• Tumor suppressors vs oncogenes
• Gain of Function (GoF) or Loss of Function
(LoF) mutations
– Phenotypes
• How to tell?
– From mutation patterns
– From expression patterns
– Functional studies
• Some genes can be both TS and oncogenes
Hallmarks of Cancer
Mutually Exclusivity and Co-occurrence
• Most cancers have >=2 sequential mutations
developed over many years.
• Mutations in different pathways can co-occur in
the same cancer, whereas those in the same
pathway are rarely mutated in the same sample.
How Much Should We Sequence?
• Need ~200 patients for 20% mutation rate, ~550
pts for 10%, ~1200 pts for 5% mutation rate.
• Most driver mutations have been found, pressing
need in basic cancer research to study their
function
• Biggest surprise: mutations on chromatin
regulators
–
–
–
–
> 50% new and strong cancer driver genes
Oncogenes: DNMT3A, IDH1
Tumor Suppressor: MLL, ATRX, ARID1A, SNF5
Both: EZH2
Resources
• MSKCC CBioPortal
– GUI interface for experimental biologists
• Broad FireHose
– API for accessing processed TCGA data
• UCSC CGHub
– API for accessing raw and processed cancer data
• Sanger COSMIC
– Catalog of Somatic Mutations in Cancer
• Many also provide software tools
Summary
•
•
•
•
•
•
Different sequencing approaches
Different mutation types and distributions
Gain or loss of function mutations
Tumor suppressor vs oncogenes
Cancer pathways or hallmarks
Mutation co-occurrence and mutual exclusivity
• How to study the functions of the mutations?
Acknolwedgement
•
•
•
•
Aleksandar Milosavljevic
John Pack
Cheng Li
Xujun Wang
Download