Exploring Monoallelic Methylation Using High-throughput Sequencing Cristian Coarfa, Ronald Harris Ting Wang, Aleksandar Milosavljevic, Joe Costello Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications Harris RA, Wang T, Coarfa C, Nagarajan RP, Hong C, Downey S, Johnson BE, Delaney A, Zhao Y, Olshen A, Ballinger T, Zhou X, Fosberg KJ, Gu J, Echipare L, O’Geen H, Lister R, Pelizzola M, Xi Y, Epstein CB, Bernstein BE, Hawkins RD, Ren B, Chung WY, Gu H, Bock C, Gnirke A, Zhang MQ, Haussler D, Ecker JR, Li W, Farnham PJ, Waterland RA, Meissner A, Marra MA, Hirst M, Milosavljevic A, Costello JF. In press, Nature Biotechnology Biological importance of intermediate methylation levels 1. Imprinting 2. Non-imprinted monoallelic methylation 3. Cell type-specific methylation 4. Sites of inter-individual variation in methylation level Unmethylated CpGs Methylated CpGs methylation-sensitive restriction digestion (MRE) methyl DNA immunoprecipitation (MeDIP) combine parallel digests, ligate adapters, size-select 100-300 bp Illumina library construction IGAII sequencing ~20 million reads/sample IP sonicated, adapter-ligated DNA, size-select 100-300 bp ~100 million reads/sample data visualization Methylated Unmethylated 5’ CpG islands are unmethylated 3’ CpG island is partially methylated Unmethylated and Methylated patches within a CpG island 1 2 high MeDIP, no or low MRE high MRE, no or low MeDIP 3 high MRE and MeDIP (uniform) 4 high MRE and MeDIP (patch Methylation) Intermediate methylation levels at imprinted genes Initial catalogue of Intermediate methylation sites Start Chr1 . . Stop MRE MeDIP nearest gene Gene . . . . . . . . . . . . . . . . Chr11 1533281 1536667 1.0342 91.9069 -205410 chr11 1946475 1948787 0.7769 58.5443 -18939 chr11 1975141 1977439 1.2845 87.5516 0 chr11 2245680 2250508 2.3451 99.4044 -29211 chr11 2420747 2423224 1.6565 29.5161 0 . chr22 HCCA2 LOC100133545 H19 C11orf21 KCNQ1 . . . . . . . . . . . . . . . . Ting Wang, Washington University Using Genetic Variation to Detect Monoallelic Epigenomic and Transcription States H1 cell line 1. Monoallelic DNA methylation (MRE and MeDIP) 2. Monoallelic expression (MethylC-seq and RNA-seq) 3. Monoallelic Histone H3K4me3 (MethylC-seq and Chip-seq) Monoallelic Epigenomic Marks and Expression MethylC-seq + RNA-seq 21 1 0 4 39 MRE-seq + MeDIP-seq 21 34 MethylC-seq + ChIP-seq Intermediate methylation levels in POTEB CpG islands MRE-seq 1 MeDIP-seq 1 MRE-seq 2 MeDIP-seq 2 Bisulfite POTEB Location chr15:19346666-19350003 Medip Allele G Count 9 MRE Allele Count A 30 Validation of monoallelic DNA methylation in POTEB Searching for Monoallelic Methlylation Using Shotgun Bisulfite Sequencing • We expect streaks of 50±d% methylation ratios • Use 500bp windows tiling CpG Islands • Compute average CpG methylation – CpG Islands – 1000 loci • Infer distribution of methylation in 1000 loci • Subselect 500bp windows tiling CpG Islands • In the selected windows, search for allele specific methylation Average methylation over 500 bp window in CpG Islands and 1000 loci Average Methylation Scores over 500bp windows in CpG Islands and 1000 putative intermediate methylation loci 5.00% % of CpG Islands w indow s % w indow s in 1000 loci 4.50% 4.00% % of windows 3.50% 3.00% 2.50% 2.00% 1.50% 1.00% 0.50% 0.00% 0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 Percent methylation 64 68 72 76 80 84 88 92 96 Parameter Search • Experimented with various lower and upper bounds for methylation • Guidelines • Discover as many of the 1000 loci • Reduce the overall number of 500bp windows Lower Bound Upper Bound Number of 500bp windows Number of 500bp windows overlapping 1000 loci % of 500bp windows overlapping 1000 loci 1000 loci overlapped 10 70 24793 2851 0.114992135 950 10 80 28060 3877 0.138168211 989 10 90 36677 5512 0.15028492 999 20 70 14084 2345 0.166500994 926 20 80 17351 3371 0.19428275 977 20 90 25968 5006 0.192775724 990 30 70 9403 1912 0.20333936 884 30 80 12670 2938 0.231886346 958 30 90 21287 4573 0.21482595 979 30-80 rediscovers 958 of loci, at the highest specificity Incorporating Genetic Variation • Search for allele-specific methylation • Look only into the 30-80% methylation loci overlapping with CpG Islands • Use het SNPs • Check for those that separate reads into different methylation states • One allele >20% • Other allele <20% • Other thresholding methods possible Results • Found 6295 heterozygous sites • 586 sites have allele specific methylation • Overlap with 62 of the 1000 loci – 37 of the loci discovered using pairs of assays – 25 new loci Monoallelic Epigenomic Marks and Expression Distribution of the 62 SBS-ASM loci MethylC-seq + RNA-seq 1 Additional 25 loci 0 0 4 9 MRE-seq + MeDIP-seq 16 7 MethylC-seq + ChIP-seq Breast Tissue Allele specific methylation Determine informative heterozygous SNPs Loci with monoallelic MRE-seq and MeDIP-seq Breast Tissue • Multiple cell types – – • Identify monoallelic events – – • Different epigenotypes Same genotype Constitutional Tissue specific Cell types for four individuals – – Conserved monoallelic marks Individual specific monoallelic marks Integrate Array-based and Seq-based methods • Collaboration with Leo Schalkwyk and Jonathan Mill, King’s College, UK • Investigate same breast tissue samples • Insight – Cost – Results • # of ASM loci • Distribution of ASM loci identified by each method – Suggestions for designing future studies Acknowledgements NIEHS/NIDA: Joni Rutter, Tanya Barrett, Fred Tyson, Christine Colvis EDACC: R. Alan Harris, Cristian Coarfa, Yuanxin Xi, Wei Li, Robert A. Waterland, Aleksandar Milosavljevic UCSF/GSC REMC: Raman Nagarajan, Chibo Hong, Sara Downey, Brett E. Johnson, Allen Delaney, Yongjun Zhao, Marco Marra, Martin Hirst, Joseph Costello – UCSC: Tracy Ballinger, David Haussler – Washington University: Xin Zhou, Maximiliaan Schillebeeckx, Ting Wang – UCD: Lorigail Echipare, Henriette O’Geen, Peggy J. Farnham UCSD REMC: Ryan Lister, Mattia Pelizzola, Bing Ren, Joseph Ecker – Cold Spring Harbor: Wen-Yu Chung, Michael Q. Zhang Broad REMC: Hongcang Gu, Christoph Bock, Andreas Gnirke, Chuck Epstein, Brad Bernstein, Alexander Meissner