epigenetics2

advertisement
Introduction to epigenetics:
chromatin modifications, DNA
methylation and the CpG Island
landscape (part 2)
Héctor Corrada Bravo
CMSC858P Spring 2012
(many slides courtesy of Rafael Irizarry)
How do we measure DNA
methylation?
Microarray Data
One question…
• Where do we measure?
• At least 7 arrays are needed to measure entire
genome
• CpG are depleated
• Remaining CpGs cluster
CpG Islands
But variation seen outside
McRBC
Input
Cuts at AmCG or GmCG
No Methylation
McRBC
Methylation
McRBC after GEL
Methylation
McRBC after GEL
Methylation
Now unmethylated
No Methylation
McRBC after Gel
No Methylation
Gene Expression Normalization does
not work well here
We use control probes
There are also waves
Smoothing
McRBC on tiling two channel array
We smooth
Proportion of neighboring CpG also
methylated/not methylated
True signal (simulated)
Observed data
Observed data and true signal
What is methylated (above 50%)?
Naïve approach
Many false positives (FP)
Smooth
No FP, but one false negative
Smooth less? No FN, lots of FP
We prefer this!
CHARM
DMR for three tissues (five replicates)
Irizarry et al, Nature Genetics 2009
Some findings
[Irizarry et al., 2009, Nat. Genetics]
Tissue easily distinguished
Cancer DMR
Many Regions like this
Note: hypo and hyper methylation
Both hyper and hypo methylated
Cancer and Tissue DMRs coincide
DMR enriched in Shores
Still affects expression
T-DMRs
Still affects expression
Supplementary Figure 2
C-DMRs
Gene expression is strongly correlated with C-DMRs at CpG island shores. For each
colon tumor versus normal mucosa C-DMR we found the closest annotated gene on the
Affymetrix HGU133A microarray, resulting in a total of 650 gene/C-DMR pairs. Plotted
are log (base 2) ratios of colon tumor to normal expression against delta M values for
colon tumor and normal DNAm. Orange dots represent C-DMRs located within 300 bp
USING SEQUENCING (BS-SEQ)
Liver
CH3
T
T
C
G
A
T
T
A
C
G
A
Brain
A
A
G
C
T
A
A
T
G
C
T
CH3
CH3
T
T
C
G
A
T
T
A
C
G
A
A
A
G
C
T
A
A
T
G
C
T
CH3
CH3
CH3
T
T
C
G
A
T
T
A
C
G
A
T
T
C
G
A
T
T
A
C
G
A
A
A
G
C CH3
T CH
3
A
A
T
G
C
T
A
A
G
C
T
A
A
T
G
C
T
CH3
CH3
chr3:44,031,616-44,031,626
85% Methylation
T
T
C
G
A
T
T
A
C
G
A
A
A
G
C CH3
T CH
3
A
A
T
G
C
T
T
T
C
G
A
T
T
A
C
A
A
G
C
T
A
A
T
G
T
T
C
G
A
T
T
A
C
G
A
T
T
CH3 C
G
A
A
T
A
T
G
A
C CHC3
T
G
A
A
A
T T
G T
C C
T G
CH3
CH3
A
T
T
A
C
G
A
A
A
G
C
T
A
A
T
G
C
T
A
A
G
C
T
A
A
T
G
C CH3
T
Bisulfite Treatment
Bisulfite Treatment
GGGGAGCAGCATGGAGGAGCCTTCGGCTGACT
GGGGAGCAGTATGGAGGAGTTTTCGGTTGATT
BS-seq
GTCGTAGTATTTGTCT
GTCGTAGTATTTGTNN
TGTCGTAGTATCTGTC
TATGTCGTAGTATTTG
Coverage: 13
TATATCGTAGTATTTT
Methylation Evidence: 13
TATATCGTAGTATTTG
Methylation Percentage: 100%
NATATCGTAGTATNTG
TTTTATATCGCAGTAT
ATATTTTATGTCGTA
ATATTTTATCTCGTA
ATATTTTATGTCGTA
GA-TATTTTATGTCGT
TAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATC
BS-seq
GTCGTAGTATTTGTCT
GTCGTAGTATTTGTNN
TGTCGTAGTATCTGTC
TATGTCGTAGTATTTG
Coverage: 13
TATATTGTAGTATTTT
Methylation Evidence: 9
TATATCGTAGTATTTG
Methylation Percentage: 69%
NATATTGTAGTATNTG
TTTTATATTGCAGTAT
ATATTTTATGTCGTA
ATATTTTATCTTGTA
ATATTTTATGTCGTA
GA-TATTTTATGTCGT
TAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATC
BS-seq
GTCGTAGTATTTGTCT
GTCGTAGTATTTGTNN
TGTTGTAGTATCTGTC
TATGTTGTAGTATTTG
Coverage: 13
TATATTGTAGTATTTT
Methylation Evidence: 4
TATATTGTAGTATTTG
Methylation Percentage: 31%
NATATTGTAGTATNTG
TTTTATATTGCAGTAT
ATATTTTATGTCGTA
ATATTTTATCTTGTA
ATATTTTATGTTGTA
GA-TATTTTATGTCGT
TAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATC
BS-seq
• Alignment is much trickier:
– Naïve strategy: do nothing, hope not many CpG in a
single read
– Smarter strategy: “bisulfite convert” reference: turn
all Cs to Ts
• Also needs to be done on reverse complement reference
and reads
– Smartest strategy: be unbiased and try all
combinations of methylated/un-methylated CpGs in
each read
• Computationally expensive (see Hansen et al, 2011, for a
strategy)
BS-seq
• There are similarities to SNP calling (we’ll see
this in a couple of weeks)
• EXCEPT: we want to measure percentages
– Use a binomial model to estimate p, percentage of
methylation
– Allow for sequencing errors, coverage differences,
etc.
Measuring DNA Methylation
• Estimating percentages
• Use “local-likelihood”
method
– Based on loess
(Plot courtesy of Kasper Hansen)
BS-seq
Lister et al. 2009, Nature
Gene Expression Regulation: DNA methylation
in promoter regions
Lister et al. 2009, Nature
DNA methylation patterns within genomic
regions
Lister et al. 2009
Putting it together
What were we after?
• The epigenetic progenitor origin of human cancer
• [Feinberg, et al., Nature Reviews Genetics, 2006]
• Stochastic epigenetic variation as driving force of
disease
• [Feinberg & Irizarry, PNAS, 2009]
• Phenotypic variation, perhaps epigenetically
mediated, increases disease susceptibility
• Increased epigenetic and gene expression
variability of specific genes/regions is a defining
characteristic of cancer
What did we do?
• Custom Illumina methylation microarray
• Confirmed increased epigenetic variability in
specific regions across five cancer types
Lung cancer
0.35
0.35
Lung normal
●
mean sig.
variance sig.
both sig
none
●
●
●
●
●
●
●
●
●
●
0.25
0.25
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
● ●
●
●
●
●●
●
●
●
● ● ●●
●
●
●
●
● ● ●●
● ●●
● ●● ●
●
●●● ●
●
●
●
●
●
● ●● ●
●
●
● ●
●
●●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●● ●●● ●
●● ●●
●● ● ●
● ●●
●
●
● ●
●
●
●
● ●
●
● ● ●●● ● ●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●●●●●
●●
● ● ●● ● ● ●
●●
●
●
● ●● ●
●
●
●●●
●
●
●
●●
●● ● ●●
● ●
●● ●●
●
●● ●
●
●●● ● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
● ●
●●●●
● ●●●●
●●● ●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●● ● ● ●
●●
●
●●
●
●●
●
●
●
●●
●●
●●
●●
●
●
●●
●
●●●
●
●●
●●
● ●●
●●●
●
●● ● ●
●●●●
●
●
●●● ●
●
●
●●●
●
●
●
●
0.05
●
0.0
●
●
0.2
●
●
0.4
0.6
normal mean
0.8
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ● ●●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●● ●
● ●
●
●●
● ●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
1.0
●
0.0
0.2
0.4
0.6
cancer mean
Breast cancer
0.35
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●● ●
●
●
● ●● ●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●●
● ●
●
● ●● ● ●
●●
●
● ●●
●
●● ●
●●
●
●
●
●●
●●
●
●
●
● ●●●
●● ● ●● ●
●● ● ●
●●
● ● ●
●
●
●
●●
●
● ●
●
● ● ●● ●
●
●
●
●
●
●●
●
●
● ●
●
●
● ● ●●
● ● ● ● ●● ● ●
●
●
●
●
● ●
● ● ●
●●
●● ●
●
●
●● ●● ●
●
●
● ●●
● ● ●●
●
●
●
●
●
●●
●●
●●● ●
● ●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
Breast normal
0.35
●
●
●
● ● ●●
●
●
●● ●
●
●
●
●
●
● ●
0.10
●
●
●
●
●
0.05
0.10
●
0.00
cancer sd
0.15
0.20
●
0.00
normal sd
0.15
0.20
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
0.30
0.30
●
●
0.8
1.0
What did we do?
• Custom Illumina methylation microarray
• Confirmed increased epigenetic variability in
specific regions across five cancer types
•
•
•
What did we do?
Custom Illumina methylation microarray
Confirmed increased epigenetic variability in specific regions across
five cancer types
Confirmed same sites are involved in tissue differentiation
•
What did we do?
Custom Illumina methylation microarray
• Whole genome sequencing of bisulfite treated DNA
– Found large blocks of hypo-methylation (sometimes Mbps long) in
colon cancer
•
What did we do?
Custom Illumina methylation microarray
• Whole genome sequencing of bisulfite treated DNA
– Found large blocks of hypo-methylation (sometimes Mbps long) in
colon cancer
– These regions coincide with hyper-variable regions across cancer types
•
What did we do?
Custom Illumina methylation microarray
• Whole genome sequencing of bisulfite treated DNA
• Gene Expression Analysis
Gene Expression Data
Gene Expression Data
When using multiple microarray experiments,
proper normalization is key
[McCall, et al., Biostatistics 2010]
Normalization is key
• fRMA: a single-chip normalization procedure
• GNUSE: a single-chip quality metric
• Barcode: a single-chip common-scale
measurement
•
What did we do?
Custom Illumina methylation microarray
• Whole genome sequencing of bisulfite treated DNA
• Gene Expression Analysis
– Genes with hyper-variable gene expression in colon cancer are
enriched in hypo-methylation blocks
[Corrada Bravo, et al., under review]
•
What are we doing next?
Custom Illumina methylation microarray
• Whole genome sequencing of bisulfite treated DNA
• Gene Expression Analysis
– Genes with hyper-variable gene expression in colon cancer are
enriched in hypo-methylation blocks
Bigger gene expression study
• 7,741 HGU133plus2 samples
• 598 normal tissue samples, 4,886 tumor
samples
• 176 different tissue types
• 175 different GEO studies
Bigger gene expression study
[Corrada Bravo, et al., under review]
•
What are we doing next?
Custom Illumina methylation microarray
• Whole genome sequencing of bisulfite treated DNA
• Gene Expression Analysis
– Genes with hyper-variable gene expression in colon cancer are
enriched in hypo-methylation blocks
– Tissue-specific genes have hyper-variable gene expression across
cancer types
[Corrada Bravo, et al., under review]
[Corrada Bravo, et al., under review]
[Corrada Bravo, et al., under review]
Download