Genome-wide DNA methylation analysis Bi-Qing Li Key Laboratory of Systems biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences outline Background Method to distinguish 5mC Array based genome-wide DNA methylation analysis NGS based genome-wide DNA methylation analysis Third generation sequencing based genome-wide DNA methylation analysis Illumina BS-seq data manipulation Background Method to distinguish 5mC Array based genome-wide DNA methylation analysis NGS based genome-wide DNA methylation analysis Third generation sequencing based genome-wide DNA methylation analysis Illumina BS-seq data manipulation Background DNA methylation is the main covalent chemical modification of DNA involved in a variety of biological processes, including embryogenesis and development, silencing of transposable elements, regulation of gene transcription and tumorigenesis and progression. The methylation pattern of DNA is highly variable among cells types and developmental stages and influenced by disease processes and genetic factors, which brings considerable theoretical and technological challenges for its comprehensive analysis. Recently various high-throughput approaches have been developed and applied for the genome wide analysis of DNA methylation providing single base pair resolution, quantitative DNA methylation data with genome wide coverage. Genes 2010, 1(1), 85-101; doi:10.3390/genes1010085 Background Method to distinguish 5mC Array based genome-wide DNA methylation analysis NGS based genome-wide DNA methylation analysis Third generation sequencing based genome-wide DNA methylation analysis Illumina BS-seq data manipulation Method to distinguish 5mC Biotechniques. 2010 Oct;49(4):iii-xi Restriction endonuclease-based analysis isoschizomer Cut unmethylated DNA Regardless of methylation neoschizomer Cut unmethylated DNA Partially affacted by CpG methylation Cut methylated DNA Pu: A or G, mC: 5-methylcytosine or 5-hydroxymethylcytosine or N4-methylcytosine , These half-sites can be separated by up to 3 kb, but the optimal separation is 55-103 base pairs Biotechniques. 2010 Oct;49(4):iii-xi Restriction endonuclease-based analysis Methylation-sensitive restriction digestion followed by PCR across the restriction site is a very sensitive technique that is still used in some applications today. This method is still applicable for some locus-specific studies that require linkage of DNA methylation information across multiple kilobases, either between CpGs or between a CpG and a genetic polymorphism. Limited by providing methylation data only at the restriction enzyme recognition sites or adjacent regions It is extremely prone to false-positive results caused by incomplete digestion for reasons other than DNA methylation. Nat Rev Genet. 2010 Feb 2;11(3):191-203 Bisulfite conversion of DNA Bisulfite conversion PCR Proc Natl Acad Sci U S A. 1992 Mar 1;89(5):1827-31. Bisulfite conversion of DNA Single base pair resolution, no bias DNA degradation by high temperature and low PH Incomplete conversion of unmethylated cytosine High GC density regions Protected by histones Stable secondary structure elements Reduced complexity of genome, greater sequence redundancy, decreased hybridization specificity Difficult to mapping (repetitive regions) Genes 2010, 1(1), 85-101; doi:10.3390/genes1010085 Immunoprecipitation-based methods methylated DNA immunoprecipitation (MeDIP-seq) Antibody recognizes 5mc to pull down the methylated fraction of genome More sensitive to highly methylated, intermediate-CpG density regions methyl-binding domain protein (MBD-seq) Using the methyl-binding protein MeCP2 or MBD2’s affinity for CpGs More sensitive to highly methylated, high-CpG density regions Methods. 2010 Nov;52(3):203-12 Immunoprecipitation-based methods Straitforward and data relatively easier to analyze Bias associated with CpG density and need adjustment High(MBD) or intermediate(MeDIP) CpG dense regions will be interpreted as “more methylated” than equally methylated low-CpG density regions Low resolution, do not yield information on individual CpG dinucleotides Methods. 2010 Nov;52(3):203-12 Background Method to distinguish 5mC Array based genome-wide DNA methylation analysis NGS based genome-wide DNA methylation analysis Third generation sequencing based genome-wide DNA methylation analysis Illumina BS-seq data manipulation Array-based genome wide DNA methylation analysis & restriction endonuclease Digestion of one pool of genomic DNA with a methylation-sensitive restriction enzyme and mock digestion of another pool or using two different enzymes Two DNA pools are amplified and labelled with different fluorescent dyes for two-color Array hybridization Nat Rev Genet. 2010 Feb 2;11(3):191-203 Array-based genome wide DNA methylation analysis & restriction endonuclease Comprehensive high-throughput arrays for relative methylation (CHARM) McrBC fractionate unmethylated DNA Label methyl-depleted DNA with Cy5 and total DNA with Cy3 Hybridized on high density arrays Cut methylated DNA Genome Res. 2008 May;18(5):780-90 Array-based genome wide DNA methylation analysis & restriction endonuclease HpaII tiny fragment enrichment by ligation mediated PCR (HELP) Digestion genomic DNA with HpaII and MspI Ligation-mediated PCR for the amplification of HpaII or MspI genomic restriction fragments Cut unmethylated DNA Label HpaII amplified with Cy5 and MspI with Cy3 Array hybridization Regardless of methylation Genome Res. 2006 Aug;16(8):1046-55 Array-based genome wide DNA methylation analysis & methylation immunoprecipitation Enrichment of methylated fragments using 5mC antibody or the affinity of methyl-binding proteins Input DNA and enriched DNA are labeled with different fluorescent dyes Array hybridization Nat Rev Genet. 2010 Feb 2;11(3):191-203 Array-based genome wide DNA methylation analysis & methylation immunoprecipitation Methylated DNA immunoprecipitation From Wikipedia, the free encyclopedia Array-based genome wide DNA methylation analysis & bisulfite conversion ILLUMINA® EPIGENETIC ANALYSIS Array-based genome wide DNA methylation analysis & bisulfite conversion 14,495 protein-coding gene promoters 27,578 CpG sites 110 microRNA gene promoters Nat Rev Genet. 2010 Feb 2;11(3):191-203 Array-based genome wide DNA methylation analysis & bisulfite conversion Genome Res. 2006 Mar;16(3):383-93 Array-based genome wide DNA methylation analysis & bisulfite conversion GoldenGate BeadArray 1536 specific CpG site in 371 gene GoldenGate Methylation Cancer Panel I 1505 CpG sites selected from 807 genes Illumina® Epigenetics Analysis Nat Rev Genet. 2010 Feb 2;11(3):191-203 Array-based genome wide DNA methylation analysis Easy to perform such experiments Easy to interpret data with many well-characterized software programs Low resolution Not easy to distinguish one repetitive element from another in a hybridization-based method Not truly genome-wide Background Method to distinguish 5mC Array based genome-wide DNA methylation analysis NGS based genome-wide DNA methylation analysis Third generation sequencing based genome-wide DNA methylation analysis Illumina BS-seq data manipulation NGS based genome-wide DNA methylation analysis Biotechniques. 2010 Oct;49(4):iii-xi NGS based genome-wide DNA methylation analysis-ROCHE 454 Roche/454 pyrosequencing-based massively parallel bisulfite pyrosequencing Include more CpG sites facilitating complex methylation pattern research Easier and more accurately aligned to reference, especially in repetitive regions Bigger chance to cover more genotype information (SNP) adjacent to cytosine Relatively high sequencing cost Higher error rates in calling identical bases Genes 2010, 1(1), 85-101; doi:10.3390/genes1010085 NGS based genome-wide DNA methylation analysis-Illumina/SOLEXA Methyl-seq ~100-350bp Regardless of methylation Illumina Genome Analyzer II Cut unmethylated DNA Genome Res. 2009 Jun;19(6):1044-56 NGS based genome-wide DNA methylation analysis-Illumina/SOLEXA Methyl-sensitive cut counting(MSCC) The method is similar to Methyl-Seq; however, sequencing of MspI libraries was reported to have little effect on the measurement of methylation and was abolished to reduce costs. Genome Med. 2009 Nov 16;1(11):106 Nat Biotechnol. 2009 Apr;27(4):361-8 NGS based genome-wide DNA methylation analysis-Illumina/SOLEXA methyl-DNA immunoprecipitation (MeDIP) seq Methods. 2009 Mar;47(3):142-50 NGS based genome-wide DNA methylation analysis-Illumina/SOLEXA Reduced representation bisulfite sequencing(RRBS) Illumina Genome Analyzer Nucleic Acids Research, 2005, Vol. 33, No. 18 Nat Methods. 2010 Feb;7(2):133-6 Nature. 2008 Aug 7;454(7205):766-70 NGS based genome-wide DNA methylation analysis-Illumina/SOLEXA Bisulfite padlock probes(BSPPs) Nat Biotechnol. 2009 Apr;27(4):353-60 NGS based genome-wide DNA methylation analysis-Illumina/SOLEXA Bisulfite sequencing(BS-seq) Nature. 2008 Mar 13;452(7184):215-9 NGS based genome-wide DNA methylation analysis-Illumina/SOLEXA Cytosine methylome sequencing (MethylC-seq) Cell. 2008 May 2;133(3):523-36 Nature. 2009 Nov 19;462(7271):315-22 Nature. 2011 Mar 3;471(7336):68-73 Background Method to distinguish 5mC Array based genome-wide DNA methylation analysis NGS based genome-wide DNA methylation analysis Third generation sequencing based genome-wide DNA methylation analysis Illumina BS-seq data manipulation Third generation sequencing based genome-wide DNA methylation analysis-PacBio single-molecule, real-time sequencing (SMRT) ZMW: zero mode waveguide Nat Biotechnol. 2010 May;28(5):426-8 Third generation sequencing based genome-wide DNA methylation analysis-PacBio single-molecule, real-time sequencing (SMRT) Nat Methods. 2010 Jun;7(6):461-5 Nat Methods. 2010 Jun;7(6):435-7 Third generation sequencing based genome-wide DNA methylation analysis-Oxford Nanopore Oxford Nanopore Technologies Nat Biotechnol. 2010 May;28(5):426-8 Background Method to distinguish 5mC Array based genome-wide DNA methylation analysis NGS based genome-wide DNA methylation analysis Third generation sequencing based genome-wide DNA methylation analysis Illumina BS-seq data manipulation Illumina BS-seq data manipulation FASTQ file format and PHRED score Adaptor trimming with FASTX Quality control with FastQC Reads filter and trimming with FASTX Reads mapping with Bismark Basic analysis Advanced analysis and application Illumina BS-seq data manipulation FASTQ file format and PHRED score Adaptor trimming with FASTX Quality control with FastQC Reads filter and trimming with FASTX Reads mapping with Bismark Basic analysis Advanced analysis and application Illumina BS-seq data manipulation FASTQ file format FASTQ has emerged as a common file format for sharing sequencing read data combining both the sequence and an associated per base quality score Nucleic Acids Research, 2010, Vol. 38, No. 6 1767–1771 Illumina BS-seq data manipulation PHRED score Nature. 2009 Nov 19;462(7271):315-22 Nucleic Acids Research, 2010, Vol. 38, No. 6 1767–1771 Illumina BS-seq data manipulation PHRED score http://en.wikipedia.org/wiki/FASTQ_format#cite_note-Illumina_User_Guide_1.5-2 Illumina BS-seq data manipulation FASTQ file format and PHRED score Adaptor trimming with FASTX Quality control with FastQC Reads filter and trimming with FASTX Reads mapping with Bismark Basic analysis Advanced analysis and application Illumina BS-seq data manipulation adaptor trimming with FASTX Nature. 2009 Nov 19;462(7271):315-22 Illumina BS-seq data manipulation adaptor trimming with FASTX http://hannonlab.cshl.edu/fastx_toolkit/index.html Illumina BS-seq data manipulation adaptor trimming with FASTX http://hannonlab.cshl.edu/fastx_toolkit/commandline.html#fastx_clipper_usage Illumina BS-seq data manipulation FASTQ file format and PHRED score Adaptor trimming with FASTX Quality control with FastQC Reads filter and trimming with FASTX Reads mapping with Bismark Basic analysis Advanced analysis and application Illumina BS-seq data manipulation Quality control with FastQC http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/ Illumina BS-seq data manipulation Quality control with FastQC Illumina BS-seq data manipulation Quality control with FastQC Illumina BS-seq data manipulation FASTQ file format and PHRED score Adaptor trimming with FASTX Quality control with FastQC Reads filter and trimming with FASTX Reads mapping with Bismark Basic analysis Advanced analysis and application Illumina BS-seq data manipulation Reads filter and trimming with FASTX e.g.1 fastq_quality_filter -Q 33 -q 20 -p 100 -v -i input -o output e.g.2 fastq_quality_filter -q 10 -p 100 -i /usr/local/data/GBS/OWB-RAD1.fastq -Q 33 | fastq_quality_filter -Q 33-q 20 -p 80 -o OWB1-filt.fastq http://hannonlab.cshl.edu/fastx_toolkit/commandline.html#fastq_quality_filter_usage Illumina BS-seq data manipulation Reads filter and trimming with FASTX FASTQ quality trimmer e.g.1 fastq_quality_trimmer -t 20 -l 35 -v -i input -o output Illumina BS-seq data manipulation FASTQ file format and PHRED score Adaptor trimming with FASTX Quality control with FastQC Reads filter and trimming with FASTX Reads mapping with Bismark Basic analysis Advanced analysis and application Illumina BS-seq data manipulation Reads mapping with Bismark Illumina BS-seq data manipulation Reads mapping with Bismark Bioinformatics. 2011 Jun 1;27(11):1571-2. Illumina BS-seq data manipulation Reads mapping with Bismark Two computationally converted reference Bioinformatics. 2011 Jun 1;27(11):1571-2. Illumina BS-seq data manipulation Reads mapping with Bismark Illumina BS-seq data manipulation Reads mapping with Bismark H=A, C or T Illumina BS-seq data manipulation Reads mapping with Bismark H=A, C or T Illumina BS-seq data manipulation Reads mapping with Bismark H=A, C or T Illumina BS-seq data manipulation Reads mapping with Bismark Illumina BS-seq data manipulation Reads mapping with Bismark Illumina BS-seq data manipulation Reads mapping with Bismark chromosome position strand context mC All C 1 468 + CG 4 4 1 469 - CG 5 6 1 470 + CG 5 5 1 471 - CG 7 7 1 7384 - CHG 6 9 1 225896 - CHH 4 16 1 771455 + CHH 5 22 1 702235 + CHG 2 12 H=A, C or T Illumina BS-seq data manipulation FASTQ file format and PHRED score Adaptor trimming with FASTX Quality control with FastQC Reads filter and trimming with FASTX Reads mapping with Bismark Basic analysis Advanced analysis and application Illumina BS-seq data manipulation Basic analysis-Reads coverage Illumina BS-seq data manipulation Basic analysis-Reads depth Illumina BS-seq data manipulation Basic analysis-Reads depth percentage Illumina BS-seq data manipulation Basic analysis- Methylation level methylationlevel number of methylated reads number of methylated reads number of unmethylated reads chromosome position strand context mC All C Methylation level 1 468 + CG 4 4 100% 1 469 - CG 5 6 83.3% 1 470 + CG 5 5 100% 1 471 - CG 7 7 100% 1 7384 - CHG 6 9 66.7% 1 225896 - CHH 4 16 25% 1 771455 + CHH 5 22 22.7% 1 702235 + CHG 2 12 16.7% H=A, C or T Illumina BS-seq data manipulation Basic analysis-Methylaion density number of calls of a given methylation type( mCG, mCHG, mCHH ) bin size mC number of calls of a given methylation type( mCG, mCHG, mCHH ) Relative methylation( ) C total number of sites of the sametype Absolute(mC ) H=A, C or T Illumina BS-seq data manipulation FASTQ file format and PHRED score Adaptor trimming with FASTX Quality control with FastQC Reads filter and trimming with FASTX Reads mapping with Bismark Basic analysis Advanced analysis and application Illumina BS-seq data manipulation Advanced analysis and application DNA methylation and gene expression DNA methylation is linked to gene silencing and is considered to be an important mechanism in the regulation of gene expression Gene expression Gene expression microarray RNA-seq Illumina BS-seq data manipulation Advanced analysis and application DNA methylation and gene expression proximal TSS (-150 bp to +150 bp across TSS) Promoter (1.5 kb upstream of the TSS) Nature. 2009 Nov 19;462(7271):315-22 Illumina BS-seq data manipulation Advanced analysis and application DNA methylation and gene expression Genome Res. 2010 Mar;20(3):320-31. Illumina BS-seq data manipulation Advanced analysis and application Differentially methylated region(DMRs) and gene expression DNA methylation at DNA–protein interaction sites DNA methylation, miRNA, and histone modification …… Genome Res. 2010 Mar;20(3):320-31. Nature. 2009 Nov 19;462(7271):315-22 Thank you!