Tumor Genome Sequencing Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520 Cancer • Cancer will affect 1 in 2 men and 1 in 3 women in the United States, and the number of new cases of cancer is set to nearly double by the year 2050. • Cancer is a genetic disease caused by mutations in the DNA • Clinically tumors can look the same but most differ genetically. Mutations in the Tumor Genome • Help us identify important genes for tumorigenesis and cancer progression • Drivers – a.k.a gatekeepers, mutations that cause and accelerate cancers • Passengers – Accidental by-products and thwarted DNA-repair mechanisms • Recurrent mutations on genes or pathways are likely drivers High Throughput Driver Detection • Differential gene expression • Copy number aberration (CNA) or variation (CNV) using CGH, tiling or SNP arrays Comparative genomic hybridization (CGH) GISTIC • Gscore: frequency of occurrence and the amplitude of the aberration • Statistical significance evaluated by permutation • FDR adjust for multiple hypothesis testing Two Major Cancer Genome Projects • TCGA: The Cancer Genome Atlas – – – – US funded ~20 cancer types * a few hundred tumor samples each Genome, transcriptome, DNA methylome, proteomics Rigorous tumor sample QC, consistent profiling platform • ICGC: International Cancer Genome Consortium – 11 countries – 20 cancer types * 500 tumor samples each Different Sequencing Approaches • Capture-seq ($400-600) – Could focus well known mutations • Exome-seq ($700-2K) – All the exons in genes; promoters and LncRNA genes? • RNA-seq ($500-2K) – Expression and mutations together, miss anything? • Whole genome sequencing ($3-4K) – Majority of mutations non-coding, function unknown – Better at detecting structural changes (translocations, fusions) – Cost-vs-benefit balance MAF and VCF Formats • VCF (GWAS format) and MAF (TCGA format) • Both can annotate somatic mutations and germline variants • Tab delimited text file • CHROM, POS, ID (SNP id, gene symbol, or ENTREZ gene id), REF (reference seq), ALT (altered sequence), QUAL (quality score), FILTER (PASS vs “q10;s50” quality <=10, <=50% samples have data here), INFO (allele counts, total counts, number of samples with data, somatic or not, validated, etc) GATK • https://www.broadinstitute.org/gatk/guide/best-practices FASTA-> BAM BAM->VCF Annotate Example of a Cancer Genome Mutations Profile • Circos Plot: how messed up a cancer genome is Total alterations affecting proteincoding genes in selected tumors Vogelstein et al, Science 2013 Somatic Mutation Frequency in 3K Tumor-Normal Pairs • Typical tumors: median 45 mutations / tumor • More mutations for tumors facing outside Mutation Rate Heterogeneity • Mutation rate correlated with replication timing, gene expression, and gene length • Tumor evolution and selection TS vs Oncogenes, GoF vs LoF • Tumor suppressors vs oncogenes • Gain of Function (GoF) or Loss of Function (LoF) mutations – Phenotypes • How to tell? – From mutation patterns – From expression patterns – Functional studies • Some genes can be both TS and oncogenes Hallmarks of Cancer Mutually Exclusivity and Co-occurrence • Most cancers have >=2 sequential mutations developed over many years. • Mutations in different pathways can co-occur in the same cancer, whereas those in the same pathway are rarely mutated in the same sample. How Much Should We Sequence? • Need ~200 patients for 20% mutation rate, ~550 pts for 10%, ~1200 pts for 5% mutation rate. • Most driver mutations have been found, pressing need in basic cancer research to study their function • Biggest surprise: mutations on chromatin regulators – – – – > 50% new and strong cancer driver genes Oncogenes: DNMT3A, IDH1 Tumor Suppressor: MLL, ATRX, ARID1A, SNF5 Both: EZH2 Resources • MSKCC CBioPortal – GUI interface for experimental biologists • Broad FireHose – API for accessing processed TCGA data • UCSC CGHub – API for accessing raw and processed cancer data • Sanger COSMIC – Catalog of Somatic Mutations in Cancer • Many also provide software tools Summary • • • • • • Different sequencing approaches Different mutation types and distributions Gain or loss of function mutations Tumor suppressor vs oncogenes Cancer pathways or hallmarks Mutation co-occurrence and mutual exclusivity • How to study the functions of the mutations? Acknolwedgement • • • • Aleksandar Milosavljevic John Pack Cheng Li Xujun Wang