Introduction to epigenetics: chromatin modifications, DNA methylation and the CpG Island landscape (part 2) Héctor Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry) How do we measure DNA methylation? Microarray Data One question… • Where do we measure? • At least 7 arrays are needed to measure entire genome • CpG are depleated • Remaining CpGs cluster CpG Islands But variation seen outside McRBC Input Cuts at AmCG or GmCG No Methylation McRBC Methylation McRBC after GEL Methylation McRBC after GEL Methylation Now unmethylated No Methylation McRBC after Gel No Methylation Gene Expression Normalization does not work well here We use control probes There are also waves Smoothing McRBC on tiling two channel array We smooth Proportion of neighboring CpG also methylated/not methylated True signal (simulated) Observed data Observed data and true signal What is methylated (above 50%)? Naïve approach Many false positives (FP) Smooth No FP, but one false negative Smooth less? No FN, lots of FP We prefer this! CHARM DMR for three tissues (five replicates) Irizarry et al, Nature Genetics 2009 Some findings [Irizarry et al., 2009, Nat. Genetics] Tissue easily distinguished Cancer DMR Many Regions like this Note: hypo and hyper methylation Both hyper and hypo methylated Cancer and Tissue DMRs coincide DMR enriched in Shores Still affects expression T-DMRs Still affects expression Supplementary Figure 2 C-DMRs Gene expression is strongly correlated with C-DMRs at CpG island shores. For each colon tumor versus normal mucosa C-DMR we found the closest annotated gene on the Affymetrix HGU133A microarray, resulting in a total of 650 gene/C-DMR pairs. Plotted are log (base 2) ratios of colon tumor to normal expression against delta M values for colon tumor and normal DNAm. Orange dots represent C-DMRs located within 300 bp USING SEQUENCING (BS-SEQ) Liver CH3 T T C G A T T A C G A Brain A A G C T A A T G C T CH3 CH3 T T C G A T T A C G A A A G C T A A T G C T CH3 CH3 CH3 T T C G A T T A C G A T T C G A T T A C G A A A G C CH3 T CH 3 A A T G C T A A G C T A A T G C T CH3 CH3 chr3:44,031,616-44,031,626 85% Methylation T T C G A T T A C G A A A G C CH3 T CH 3 A A T G C T T T C G A T T A C A A G C T A A T G T T C G A T T A C G A T T CH3 C G A A T A T G A C CHC3 T G A A A T T G T C C T G CH3 CH3 A T T A C G A A A G C T A A T G C T A A G C T A A T G C CH3 T Bisulfite Treatment Bisulfite Treatment GGGGAGCAGCATGGAGGAGCCTTCGGCTGACT GGGGAGCAGTATGGAGGAGTTTTCGGTTGATT BS-seq GTCGTAGTATTTGTCT GTCGTAGTATTTGTNN TGTCGTAGTATCTGTC TATGTCGTAGTATTTG Coverage: 13 TATATCGTAGTATTTT Methylation Evidence: 13 TATATCGTAGTATTTG Methylation Percentage: 100% NATATCGTAGTATNTG TTTTATATCGCAGTAT ATATTTTATGTCGTA ATATTTTATCTCGTA ATATTTTATGTCGTA GA-TATTTTATGTCGT TAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATC BS-seq GTCGTAGTATTTGTCT GTCGTAGTATTTGTNN TGTCGTAGTATCTGTC TATGTCGTAGTATTTG Coverage: 13 TATATTGTAGTATTTT Methylation Evidence: 9 TATATCGTAGTATTTG Methylation Percentage: 69% NATATTGTAGTATNTG TTTTATATTGCAGTAT ATATTTTATGTCGTA ATATTTTATCTTGTA ATATTTTATGTCGTA GA-TATTTTATGTCGT TAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATC BS-seq GTCGTAGTATTTGTCT GTCGTAGTATTTGTNN TGTTGTAGTATCTGTC TATGTTGTAGTATTTG Coverage: 13 TATATTGTAGTATTTT Methylation Evidence: 4 TATATTGTAGTATTTG Methylation Percentage: 31% NATATTGTAGTATNTG TTTTATATTGCAGTAT ATATTTTATGTCGTA ATATTTTATCTTGTA ATATTTTATGTTGTA GA-TATTTTATGTCGT TAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATC BS-seq • Alignment is much trickier: – Naïve strategy: do nothing, hope not many CpG in a single read – Smarter strategy: “bisulfite convert” reference: turn all Cs to Ts • Also needs to be done on reverse complement reference and reads – Smartest strategy: be unbiased and try all combinations of methylated/un-methylated CpGs in each read • Computationally expensive (see Hansen et al, 2011, for a strategy) BS-seq • There are similarities to SNP calling (we’ll see this in a couple of weeks) • EXCEPT: we want to measure percentages – Use a binomial model to estimate p, percentage of methylation – Allow for sequencing errors, coverage differences, etc. Measuring DNA Methylation • Estimating percentages • Use “local-likelihood” method – Based on loess (Plot courtesy of Kasper Hansen) BS-seq Lister et al. 2009, Nature Gene Expression Regulation: DNA methylation in promoter regions Lister et al. 2009, Nature DNA methylation patterns within genomic regions Lister et al. 2009 Putting it together What were we after? • The epigenetic progenitor origin of human cancer • [Feinberg, et al., Nature Reviews Genetics, 2006] • Stochastic epigenetic variation as driving force of disease • [Feinberg & Irizarry, PNAS, 2009] • Phenotypic variation, perhaps epigenetically mediated, increases disease susceptibility • Increased epigenetic and gene expression variability of specific genes/regions is a defining characteristic of cancer What did we do? • Custom Illumina methylation microarray • Confirmed increased epigenetic variability in specific regions across five cancer types Lung cancer 0.35 0.35 Lung normal ● mean sig. variance sig. both sig none ● ● ● ● ● ● ● ● ● ● 0.25 0.25 ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●●● ● ●● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ●●●●● ●● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ●●● ● ● ● ●● ●● ● ●● ● ● ●● ●● ● ●● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●● ● ●●●● ●●● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ●● ● ●● ● ●● ● ● ● ●● ●● ●● ●● ● ● ●● ● ●●● ● ●● ●● ● ●● ●●● ● ●● ● ● ●●●● ● ● ●●● ● ● ● ●●● ● ● ● ● 0.05 ● 0.0 ● ● 0.2 ● ● 0.4 0.6 normal mean 0.8 ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● 1.0 ● 0.0 0.2 0.4 0.6 cancer mean Breast cancer 0.35 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ●● ● ●● ● ●● ● ● ● ●● ●● ● ● ● ● ●●● ●● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ●● ●●● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● Breast normal 0.35 ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● 0.10 ● ● ● ● ● 0.05 0.10 ● 0.00 cancer sd 0.15 0.20 ● 0.00 normal sd 0.15 0.20 ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● 0.30 0.30 ● ● 0.8 1.0 What did we do? • Custom Illumina methylation microarray • Confirmed increased epigenetic variability in specific regions across five cancer types • • • What did we do? Custom Illumina methylation microarray Confirmed increased epigenetic variability in specific regions across five cancer types Confirmed same sites are involved in tissue differentiation • What did we do? Custom Illumina methylation microarray • Whole genome sequencing of bisulfite treated DNA – Found large blocks of hypo-methylation (sometimes Mbps long) in colon cancer • What did we do? Custom Illumina methylation microarray • Whole genome sequencing of bisulfite treated DNA – Found large blocks of hypo-methylation (sometimes Mbps long) in colon cancer – These regions coincide with hyper-variable regions across cancer types • What did we do? Custom Illumina methylation microarray • Whole genome sequencing of bisulfite treated DNA • Gene Expression Analysis Gene Expression Data Gene Expression Data When using multiple microarray experiments, proper normalization is key [McCall, et al., Biostatistics 2010] Normalization is key • fRMA: a single-chip normalization procedure • GNUSE: a single-chip quality metric • Barcode: a single-chip common-scale measurement • What did we do? Custom Illumina methylation microarray • Whole genome sequencing of bisulfite treated DNA • Gene Expression Analysis – Genes with hyper-variable gene expression in colon cancer are enriched in hypo-methylation blocks [Corrada Bravo, et al., under review] • What are we doing next? Custom Illumina methylation microarray • Whole genome sequencing of bisulfite treated DNA • Gene Expression Analysis – Genes with hyper-variable gene expression in colon cancer are enriched in hypo-methylation blocks Bigger gene expression study • 7,741 HGU133plus2 samples • 598 normal tissue samples, 4,886 tumor samples • 176 different tissue types • 175 different GEO studies Bigger gene expression study [Corrada Bravo, et al., under review] • What are we doing next? Custom Illumina methylation microarray • Whole genome sequencing of bisulfite treated DNA • Gene Expression Analysis – Genes with hyper-variable gene expression in colon cancer are enriched in hypo-methylation blocks – Tissue-specific genes have hyper-variable gene expression across cancer types [Corrada Bravo, et al., under review] [Corrada Bravo, et al., under review] [Corrada Bravo, et al., under review]