Genome-wide DNA methylation analysis

advertisement
Genome-wide DNA methylation analysis
Bi-Qing Li
Key Laboratory of Systems biology,
Shanghai Institutes for Biological Sciences,
Chinese Academy of Sciences
outline
 Background
 Method to distinguish 5mC
 Array based genome-wide DNA methylation analysis
 NGS based genome-wide DNA methylation analysis
 Third generation sequencing based genome-wide DNA
methylation analysis
 Illumina BS-seq data manipulation
 Background
 Method to distinguish 5mC
 Array based genome-wide DNA methylation analysis
 NGS based genome-wide DNA methylation analysis
 Third generation sequencing based genome-wide DNA
methylation analysis
 Illumina BS-seq data manipulation
Background
 DNA methylation is the main covalent chemical modification
of DNA involved in a variety of biological processes, including
embryogenesis and development, silencing of transposable
elements, regulation of gene transcription and tumorigenesis
and progression.
 The methylation pattern of DNA is highly variable among cells
types and developmental stages and influenced by disease
processes and genetic factors, which brings considerable
theoretical and technological challenges for its comprehensive
analysis.
 Recently various high-throughput approaches have been
developed and applied for the genome wide analysis of DNA
methylation providing single base pair resolution, quantitative
DNA methylation data with genome wide coverage.
Genes 2010, 1(1), 85-101; doi:10.3390/genes1010085
 Background
 Method to distinguish 5mC
 Array based genome-wide DNA methylation analysis
 NGS based genome-wide DNA methylation analysis
 Third generation sequencing based genome-wide DNA
methylation analysis
 Illumina BS-seq data manipulation
Method to distinguish 5mC
Biotechniques. 2010 Oct;49(4):iii-xi
Restriction endonuclease-based analysis
isoschizomer
Cut unmethylated DNA
Regardless of methylation
neoschizomer
Cut unmethylated DNA
Partially affacted by CpG methylation
Cut methylated DNA
Pu: A or G, mC: 5-methylcytosine or 5-hydroxymethylcytosine or N4-methylcytosine , These half-sites
can be separated by up to 3 kb, but the optimal separation is 55-103 base pairs
Biotechniques. 2010 Oct;49(4):iii-xi
Restriction endonuclease-based analysis
 Methylation-sensitive restriction digestion followed by PCR
across the restriction site is a very sensitive technique that is
still used in some applications today.
 This method is still applicable for some locus-specific studies
that require linkage of DNA methylation information across
multiple kilobases, either between CpGs or between a CpG and
a genetic polymorphism.
 Limited by providing methylation data only at the restriction
enzyme recognition sites or adjacent regions
 It is extremely prone to false-positive results caused by
incomplete digestion for reasons other than DNA methylation.
Nat Rev Genet. 2010 Feb 2;11(3):191-203
Bisulfite conversion of DNA
Bisulfite conversion
PCR
Proc Natl Acad Sci U S A. 1992 Mar 1;89(5):1827-31.
Bisulfite conversion of DNA
 Single base pair resolution, no bias
 DNA degradation by high temperature and low PH
 Incomplete conversion of unmethylated cytosine
 High GC density regions
 Protected by histones
 Stable secondary structure elements
 Reduced complexity of genome, greater sequence
redundancy, decreased hybridization specificity
 Difficult to mapping (repetitive regions)
Genes 2010, 1(1), 85-101; doi:10.3390/genes1010085
Immunoprecipitation-based methods
 methylated DNA immunoprecipitation (MeDIP-seq)
 Antibody recognizes 5mc to pull down the methylated fraction
of genome
 More sensitive to highly methylated, intermediate-CpG density
regions
 methyl-binding domain protein (MBD-seq)
 Using the methyl-binding protein MeCP2 or MBD2’s affinity
for CpGs
 More sensitive to highly methylated, high-CpG density regions
Methods. 2010 Nov;52(3):203-12
Immunoprecipitation-based methods
 Straitforward and data relatively easier to analyze
 Bias associated with CpG density and need adjustment
 High(MBD) or intermediate(MeDIP) CpG dense regions
will be interpreted as “more methylated” than equally
methylated low-CpG density regions
 Low resolution, do not yield information on individual
CpG dinucleotides
Methods. 2010 Nov;52(3):203-12
 Background
 Method to distinguish 5mC
 Array based genome-wide DNA methylation analysis
 NGS based genome-wide DNA methylation analysis
 Third generation sequencing based genome-wide DNA
methylation analysis
 Illumina BS-seq data manipulation
Array-based genome wide DNA methylation
analysis & restriction endonuclease
 Digestion of one pool of genomic DNA with a
methylation-sensitive restriction enzyme and mock
digestion of another pool or using two different enzymes
 Two DNA pools are amplified and labelled with different
fluorescent dyes for two-color
 Array hybridization
Nat Rev Genet. 2010 Feb 2;11(3):191-203
Array-based genome wide DNA methylation
analysis & restriction endonuclease
Comprehensive high-throughput arrays for relative
methylation (CHARM)
McrBC fractionate unmethylated DNA
Label methyl-depleted DNA with Cy5 and total DNA with
Cy3
Hybridized on high density arrays
Cut methylated DNA
Genome Res. 2008 May;18(5):780-90
Array-based genome wide DNA methylation
analysis & restriction endonuclease
HpaII tiny fragment enrichment by ligation mediated
PCR (HELP)
 Digestion genomic DNA with HpaII and MspI
 Ligation-mediated PCR for the amplification of HpaII or
MspI genomic restriction fragments
Cut unmethylated DNA
 Label HpaII amplified with Cy5 and MspI with Cy3
 Array hybridization
Regardless of methylation
Genome Res. 2006 Aug;16(8):1046-55
Array-based genome wide DNA methylation
analysis & methylation immunoprecipitation
 Enrichment of methylated fragments using 5mC antibody
or the affinity of methyl-binding proteins
 Input DNA and enriched DNA are labeled with different
fluorescent dyes
 Array hybridization
Nat Rev Genet. 2010 Feb 2;11(3):191-203
Array-based genome wide DNA methylation
analysis & methylation immunoprecipitation
Methylated DNA immunoprecipitation
From Wikipedia, the free encyclopedia
Array-based genome wide DNA methylation
analysis & bisulfite conversion
ILLUMINA® EPIGENETIC ANALYSIS
Array-based genome wide DNA methylation
analysis & bisulfite conversion
14,495 protein-coding gene promoters
27,578 CpG sites
110 microRNA gene promoters
Nat Rev Genet. 2010
Feb 2;11(3):191-203
Array-based genome wide DNA methylation
analysis & bisulfite conversion
Genome Res. 2006 Mar;16(3):383-93
Array-based genome wide DNA methylation
analysis & bisulfite conversion
GoldenGate BeadArray 1536 specific CpG site in 371 gene
GoldenGate Methylation Cancer Panel I 1505 CpG sites selected from 807 genes
Illumina® Epigenetics Analysis
Nat Rev Genet. 2010 Feb 2;11(3):191-203
Array-based genome wide DNA
methylation analysis
 Easy to perform such experiments
 Easy to interpret data with many well-characterized
software programs
 Low resolution
 Not easy to distinguish one repetitive element from
another in a hybridization-based method
 Not truly genome-wide
 Background
 Method to distinguish 5mC
 Array based genome-wide DNA methylation analysis
 NGS based genome-wide DNA methylation analysis
 Third generation sequencing based genome-wide DNA
methylation analysis
 Illumina BS-seq data manipulation
NGS based genome-wide DNA
methylation analysis
Biotechniques. 2010 Oct;49(4):iii-xi
NGS based genome-wide DNA
methylation analysis-ROCHE 454
Roche/454 pyrosequencing-based massively parallel
bisulfite pyrosequencing
 Include more CpG sites facilitating complex methylation
pattern research
 Easier and more accurately aligned to reference, especially
in repetitive regions
 Bigger chance to cover more genotype information (SNP)
adjacent to cytosine
 Relatively high sequencing cost
 Higher error rates in calling identical bases
Genes 2010, 1(1), 85-101; doi:10.3390/genes1010085
NGS based genome-wide DNA methylation
analysis-Illumina/SOLEXA
Methyl-seq
~100-350bp
Regardless of methylation
Illumina Genome
Analyzer II
Cut unmethylated DNA
Genome Res. 2009 Jun;19(6):1044-56
NGS based genome-wide DNA methylation
analysis-Illumina/SOLEXA
Methyl-sensitive cut counting(MSCC)
The method is similar to Methyl-Seq;
however, sequencing of MspI
libraries was reported to have little
effect on the measurement of
methylation and was abolished to
reduce costs.
Genome Med. 2009 Nov 16;1(11):106
Nat Biotechnol. 2009 Apr;27(4):361-8
NGS based genome-wide DNA methylation
analysis-Illumina/SOLEXA
methyl-DNA immunoprecipitation
(MeDIP) seq
Methods. 2009 Mar;47(3):142-50
NGS based genome-wide DNA methylation
analysis-Illumina/SOLEXA
Reduced representation bisulfite sequencing(RRBS)
Illumina Genome
Analyzer
Nucleic Acids Research, 2005, Vol. 33, No. 18
Nat Methods. 2010 Feb;7(2):133-6
Nature. 2008 Aug 7;454(7205):766-70
NGS based genome-wide DNA
methylation analysis-Illumina/SOLEXA
Bisulfite padlock probes(BSPPs)
Nat Biotechnol. 2009 Apr;27(4):353-60
NGS based genome-wide DNA methylation
analysis-Illumina/SOLEXA
Bisulfite sequencing(BS-seq)
Nature. 2008 Mar 13;452(7184):215-9
NGS based genome-wide DNA methylation
analysis-Illumina/SOLEXA
Cytosine methylome sequencing
(MethylC-seq)
Cell. 2008 May 2;133(3):523-36
Nature. 2009 Nov 19;462(7271):315-22
Nature. 2011 Mar 3;471(7336):68-73
 Background
 Method to distinguish 5mC
 Array based genome-wide DNA methylation analysis
 NGS based genome-wide DNA methylation analysis
 Third generation sequencing based genome-wide DNA
methylation analysis
 Illumina BS-seq data manipulation
Third generation sequencing based genome-wide
DNA methylation analysis-PacBio
single-molecule, real-time sequencing (SMRT)
ZMW: zero mode waveguide
Nat Biotechnol. 2010 May;28(5):426-8
Third generation sequencing based genome-wide
DNA methylation analysis-PacBio
single-molecule, real-time sequencing (SMRT)
Nat Methods. 2010 Jun;7(6):461-5
Nat Methods. 2010 Jun;7(6):435-7
Third generation sequencing based genome-wide
DNA methylation analysis-Oxford Nanopore
Oxford Nanopore Technologies
Nat Biotechnol. 2010 May;28(5):426-8
 Background
 Method to distinguish 5mC
 Array based genome-wide DNA methylation analysis
 NGS based genome-wide DNA methylation analysis
 Third generation sequencing based genome-wide DNA
methylation analysis
 Illumina BS-seq data manipulation
Illumina BS-seq data manipulation
FASTQ file format and PHRED score
Adaptor trimming with FASTX
Quality control with FastQC
Reads filter and trimming with FASTX
Reads mapping with Bismark
Basic analysis
Advanced analysis and application
Illumina BS-seq data manipulation
FASTQ file format and PHRED score
Adaptor trimming with FASTX
Quality control with FastQC
Reads filter and trimming with FASTX
Reads mapping with Bismark
Basic analysis
Advanced analysis and application
Illumina BS-seq data manipulation
FASTQ file format
FASTQ has emerged as a common file format for sharing sequencing read data
combining both the sequence and an associated per base quality score
Nucleic Acids Research, 2010, Vol. 38, No. 6 1767–1771
Illumina BS-seq data manipulation
PHRED score
Nature. 2009 Nov 19;462(7271):315-22
Nucleic Acids Research, 2010, Vol. 38, No. 6 1767–1771
Illumina BS-seq data manipulation
PHRED score
http://en.wikipedia.org/wiki/FASTQ_format#cite_note-Illumina_User_Guide_1.5-2
Illumina BS-seq data manipulation
FASTQ file format and PHRED score
Adaptor trimming with FASTX
Quality control with FastQC
Reads filter and trimming with FASTX
Reads mapping with Bismark
Basic analysis
Advanced analysis and application
Illumina BS-seq data manipulation
adaptor trimming with FASTX
Nature. 2009 Nov 19;462(7271):315-22
Illumina BS-seq data manipulation
adaptor trimming with FASTX
http://hannonlab.cshl.edu/fastx_toolkit/index.html
Illumina BS-seq data manipulation
adaptor trimming with FASTX
http://hannonlab.cshl.edu/fastx_toolkit/commandline.html#fastx_clipper_usage
Illumina BS-seq data manipulation
FASTQ file format and PHRED score
Adaptor trimming with FASTX
Quality control with FastQC
Reads filter and trimming with FASTX
Reads mapping with Bismark
Basic analysis
Advanced analysis and application
Illumina BS-seq data manipulation
Quality control with FastQC
http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/
Illumina BS-seq data manipulation
Quality control with FastQC
Illumina BS-seq data manipulation
Quality control with FastQC
Illumina BS-seq data manipulation
FASTQ file format and PHRED score
Adaptor trimming with FASTX
Quality control with FastQC
Reads filter and trimming with FASTX
Reads mapping with Bismark
Basic analysis
Advanced analysis and application
Illumina BS-seq data manipulation
Reads filter and trimming with FASTX
e.g.1 fastq_quality_filter -Q 33 -q 20 -p 100 -v -i input -o output
e.g.2 fastq_quality_filter -q 10 -p 100 -i /usr/local/data/GBS/OWB-RAD1.fastq -Q 33 |
fastq_quality_filter -Q 33-q 20 -p 80 -o OWB1-filt.fastq
http://hannonlab.cshl.edu/fastx_toolkit/commandline.html#fastq_quality_filter_usage
Illumina BS-seq data manipulation
Reads filter and trimming with FASTX
FASTQ quality trimmer
e.g.1 fastq_quality_trimmer -t 20 -l 35 -v -i input -o output
Illumina BS-seq data manipulation
 FASTQ file format and PHRED score
 Adaptor trimming with FASTX
 Quality control with FastQC
 Reads filter and trimming with FASTX
 Reads mapping with Bismark
 Basic analysis
 Advanced analysis and application
Illumina BS-seq data manipulation
Reads mapping with Bismark
Illumina BS-seq data manipulation
Reads mapping with Bismark
Bioinformatics. 2011 Jun 1;27(11):1571-2.
Illumina BS-seq data manipulation
Reads mapping with Bismark
Two computationally converted reference
Bioinformatics. 2011 Jun 1;27(11):1571-2.
Illumina BS-seq data manipulation
Reads mapping with Bismark
Illumina BS-seq data manipulation
Reads mapping with Bismark
H=A, C or T
Illumina BS-seq data manipulation
Reads mapping with Bismark
H=A, C or T
Illumina BS-seq data manipulation
Reads mapping with Bismark
H=A, C or T
Illumina BS-seq data manipulation
Reads mapping with Bismark
Illumina BS-seq data manipulation
Reads mapping with Bismark
Illumina BS-seq data manipulation
Reads mapping with Bismark
chromosome
position
strand
context
mC
All C
1
468
+
CG
4
4
1
469
-
CG
5
6
1
470
+
CG
5
5
1
471
-
CG
7
7
1
7384
-
CHG
6
9
1
225896
-
CHH
4
16
1
771455
+
CHH
5
22
1
702235
+
CHG
2
12
H=A, C or T
Illumina BS-seq data manipulation
 FASTQ file format and PHRED score
 Adaptor trimming with FASTX
 Quality control with FastQC
 Reads filter and trimming with FASTX
 Reads mapping with Bismark
 Basic analysis
 Advanced analysis and application
Illumina BS-seq data manipulation
Basic analysis-Reads coverage
Illumina BS-seq data manipulation
Basic analysis-Reads depth
Illumina BS-seq data manipulation
Basic analysis-Reads depth percentage
Illumina BS-seq data manipulation
Basic analysis- Methylation level
methylationlevel 
number of methylated reads
number of methylated reads  number of unmethylated reads
chromosome
position
strand
context
mC
All C
Methylation
level
1
468
+
CG
4
4
100%
1
469
-
CG
5
6
83.3%
1
470
+
CG
5
5
100%
1
471
-
CG
7
7
100%
1
7384
-
CHG
6
9
66.7%
1
225896
-
CHH
4
16
25%
1
771455
+
CHH
5
22
22.7%
1
702235
+
CHG
2
12
16.7%
H=A, C or T
Illumina BS-seq data manipulation
Basic analysis-Methylaion density
number of calls of a given methylation type( mCG, mCHG, mCHH )
bin size
mC
number of calls of a given methylation type( mCG, mCHG, mCHH )
Relative methylation(
)
C
total number of sites of the sametype
Absolute(mC ) 
H=A, C or T
Illumina BS-seq data manipulation
 FASTQ file format and PHRED score
 Adaptor trimming with FASTX
 Quality control with FastQC
 Reads filter and trimming with FASTX
 Reads mapping with Bismark
 Basic analysis
 Advanced analysis and application
Illumina BS-seq data manipulation
Advanced analysis and application
DNA methylation and gene expression
 DNA methylation is linked to gene silencing and is
considered to be an important mechanism in the
regulation of gene expression
 Gene expression
 Gene expression microarray
 RNA-seq
Illumina BS-seq data manipulation
Advanced analysis and application
DNA methylation and gene expression
proximal TSS (-150 bp to +150 bp across TSS)
Promoter (1.5 kb upstream of the TSS)
Nature. 2009 Nov 19;462(7271):315-22
Illumina BS-seq data manipulation
Advanced analysis and application
DNA methylation and gene expression
Genome Res. 2010 Mar;20(3):320-31.
Illumina BS-seq data manipulation
Advanced analysis and application
 Differentially methylated region(DMRs) and gene
expression
 DNA methylation at DNA–protein interaction sites
 DNA methylation, miRNA, and histone modification
 ……
Genome Res. 2010 Mar;20(3):320-31.
Nature. 2009 Nov 19;462(7271):315-22
Thank you!
Download