NIH Epigenomics Roadmap Data Analysis and

advertisement
Exploring Monoallelic Methylation
Using High-throughput Sequencing
Cristian Coarfa, Ronald Harris
Ting Wang, Aleksandar Milosavljevic, Joe Costello
Comparison of sequencing-based methods to
profile DNA methylation and identification of
monoallelic epigenetic modifications
Harris RA, Wang T, Coarfa C, Nagarajan RP, Hong C, Downey S, Johnson
BE, Delaney A, Zhao Y, Olshen A, Ballinger T, Zhou X, Fosberg KJ, Gu J,
Echipare L, O’Geen H, Lister R, Pelizzola M, Xi Y, Epstein CB, Bernstein BE,
Hawkins RD, Ren B, Chung WY, Gu H, Bock C, Gnirke A, Zhang MQ,
Haussler D, Ecker JR, Li W, Farnham PJ, Waterland RA, Meissner A, Marra
MA, Hirst M, Milosavljevic A, Costello JF.
In press, Nature Biotechnology
Biological importance of intermediate methylation levels
1. Imprinting
2. Non-imprinted monoallelic methylation
3. Cell type-specific methylation
4. Sites of inter-individual variation in methylation level
Unmethylated CpGs
Methylated CpGs
methylation-sensitive
restriction digestion
(MRE)
methyl DNA
immunoprecipitation
(MeDIP)
combine parallel digests,
ligate adapters,
size-select 100-300 bp
Illumina library construction
IGAII sequencing
~20 million reads/sample
IP sonicated, adapter-ligated
DNA, size-select 100-300 bp
~100 million reads/sample
data visualization
Methylated
Unmethylated
5’ CpG islands
are unmethylated
3’ CpG island is
partially methylated
Unmethylated and Methylated patches within a CpG island
1
2
high MeDIP, no or low MRE
high MRE, no or low MeDIP
3
high MRE and MeDIP
(uniform)
4
high MRE and MeDIP
(patch Methylation)
Intermediate methylation levels at imprinted genes
Initial catalogue of Intermediate methylation sites
Start
Chr1
.
.
Stop
MRE
MeDIP
nearest gene Gene
. . . . . . . . . . . . . . .
.
Chr11
1533281
1536667
1.0342
91.9069
-205410
chr11
1946475
1948787
0.7769
58.5443
-18939
chr11
1975141
1977439
1.2845
87.5516
0
chr11
2245680
2250508
2.3451
99.4044
-29211
chr11
2420747
2423224
1.6565
29.5161
0
.
chr22
HCCA2
LOC100133545
H19
C11orf21
KCNQ1
. . . . . . . . . . . . . . . .
Ting Wang, Washington University
Using Genetic Variation to Detect Monoallelic
Epigenomic and Transcription States
H1 cell line
1. Monoallelic DNA methylation (MRE and MeDIP)
2. Monoallelic expression (MethylC-seq and RNA-seq)
3. Monoallelic Histone H3K4me3 (MethylC-seq and Chip-seq)
Monoallelic Epigenomic Marks and Expression
MethylC-seq + RNA-seq
21
1
0
4
39
MRE-seq
+
MeDIP-seq
21
34
MethylC-seq
+
ChIP-seq
Intermediate methylation levels in POTEB
CpG islands
MRE-seq 1
MeDIP-seq 1
MRE-seq 2
MeDIP-seq 2
Bisulfite
POTEB
Location
chr15:19346666-19350003
Medip Allele
G
Count
9
MRE Allele Count
A
30
Validation of monoallelic DNA methylation in POTEB
Searching for Monoallelic Methlylation
Using Shotgun Bisulfite Sequencing
• We expect streaks of 50±d% methylation ratios
• Use 500bp windows tiling CpG Islands
• Compute average CpG methylation
– CpG Islands
– 1000 loci
• Infer distribution of methylation in 1000 loci
• Subselect 500bp windows tiling CpG Islands
• In the selected windows, search for allele specific
methylation
Average methylation over 500 bp window
in CpG Islands and 1000 loci
Average Methylation Scores over 500bp windows in CpG Islands and 1000
putative intermediate methylation loci
5.00%
% of CpG Islands w indow s
% w indow s in 1000 loci
4.50%
4.00%
% of windows
3.50%
3.00%
2.50%
2.00%
1.50%
1.00%
0.50%
0.00%
0
4
8
12
16
20
24
28
32
36
40
44
48
52
56
60
Percent methylation
64
68
72
76
80
84
88
92
96
Parameter Search
• Experimented with various lower and upper bounds for methylation
• Guidelines
• Discover as many of the 1000 loci
• Reduce the overall number of 500bp windows
Lower
Bound
Upper
Bound
Number of 500bp
windows
Number of 500bp windows
overlapping 1000 loci
% of 500bp windows
overlapping 1000 loci
1000 loci
overlapped
10
70
24793
2851
0.114992135
950
10
80
28060
3877
0.138168211
989
10
90
36677
5512
0.15028492
999
20
70
14084
2345
0.166500994
926
20
80
17351
3371
0.19428275
977
20
90
25968
5006
0.192775724
990
30
70
9403
1912
0.20333936
884
30
80
12670
2938
0.231886346
958
30
90
21287
4573
0.21482595
979
30-80 rediscovers 958 of loci, at the highest specificity
Incorporating Genetic Variation
• Search for allele-specific methylation
• Look only into the 30-80% methylation loci overlapping with CpG
Islands
• Use het SNPs
• Check for those that separate reads into different methylation states
• One allele >20%
• Other allele <20%
• Other thresholding methods possible
Results
• Found 6295 heterozygous sites
• 586 sites have allele specific methylation
• Overlap with 62 of the 1000 loci
– 37 of the loci discovered using pairs of assays
– 25 new loci
Monoallelic Epigenomic Marks and Expression
Distribution of the 62 SBS-ASM loci
MethylC-seq + RNA-seq
1
Additional
25 loci
0
0
4
9
MRE-seq
+
MeDIP-seq
16
7
MethylC-seq
+
ChIP-seq
Breast Tissue

Allele specific methylation

Determine informative heterozygous SNPs

Loci with monoallelic MRE-seq and MeDIP-seq
Breast Tissue
•
Multiple cell types
–
–
•
Identify monoallelic events
–
–
•
Different epigenotypes
Same genotype
Constitutional
Tissue specific
Cell types for four individuals
–
–
Conserved monoallelic marks
Individual specific monoallelic marks
Integrate Array-based and Seq-based methods
• Collaboration with Leo Schalkwyk and Jonathan Mill,
King’s College, UK
• Investigate same breast tissue samples
• Insight
– Cost
– Results
• # of ASM loci
• Distribution of ASM loci identified by each method
– Suggestions for designing future studies
Acknowledgements
NIEHS/NIDA: Joni Rutter, Tanya Barrett, Fred Tyson, Christine Colvis
EDACC: R. Alan Harris, Cristian Coarfa, Yuanxin Xi, Wei Li, Robert A. Waterland, Aleksandar
Milosavljevic
UCSF/GSC REMC: Raman Nagarajan, Chibo Hong, Sara Downey, Brett E. Johnson, Allen
Delaney, Yongjun Zhao, Marco Marra, Martin Hirst, Joseph Costello
– UCSC: Tracy Ballinger, David Haussler
– Washington University: Xin Zhou, Maximiliaan Schillebeeckx, Ting Wang
– UCD: Lorigail Echipare, Henriette O’Geen, Peggy J. Farnham
UCSD REMC: Ryan Lister, Mattia Pelizzola, Bing Ren, Joseph Ecker
– Cold Spring Harbor: Wen-Yu Chung, Michael Q. Zhang
Broad REMC: Hongcang Gu, Christoph Bock, Andreas Gnirke, Chuck Epstein, Brad Bernstein,
Alexander Meissner
Download