Introduction to ENCODE

advertisement
An Introduction to ENCODE
Mark Reimers, VIPBG
(borrowing heavily from John
Stamatoyannopoulos and the ENCODE papers)
Outline
•
•
•
•
•
What is ENCODE?
The ENCODE phase II papers
Identifying regulatory DNA
Of mice and men
Implications for understanding GWAS
What is ENCODE?
The Encyclopedia of DNA Elements
(ENCODE) project aims to delineate all
functional elements encoded in the
human genome
Transcribed ‘genes’ and regulatory
regions
The ENCODE Phase II Papers
• Five papers in Nature, one in Cell, two in
Science, 18 in Genome Research, six in
Genome Biology
• 440 authors from 32 labs
• About half are directly relevant to molecular
genetics
• http://www.nature.com/encode/#/threads
The Data
• 1,640 data sets on 147 cell lines or tissues
• Almost all high-throughput sequencing
• RNA, DNA methylation, binding of common
transcription factors, histone marks
(H3K4me1, H3K4me3, H3K27ac, H3K27me3,
..)
• Each raw data set ~ 5GB (uncompressed)
• http://genome.ucsc.edu/ENCODE/
What the ENCODE Data Look Like
• Multiple tracks for
various epigenetic mark
assays
• Track values count reads
from DNA fragments
produced by specific
assays aligned to
genome (H19 or MM9)
Genes, regulatory DNA, and epigenetic features
Graphic from NIH RoadMap Epigenomics Site
Genes, regulatory DNA, and epigenetic features
- promoters
- enhancers
- silencers
- insulators
- etc.
Open Chromatin at Regulatory Sites
• Almost all DNA-binding proteins bind to
unwrapped DNA
• Either they must force open the DNA and
displace histones or wait for another TF to do
• Open chromatin is not an epigenetic ‘mark’
but is a useful indicator of functional DNA
• Open chromatin can be assayed by DNase I
sensitivity – where does DNase I cut?
Genes, regulatory DNA, and epigenetic features
DNaseI
- promoters
- enhancers
- silencers
- insulators
- etc.
DNaseI hypersensitive sites mark regulatory DNA
DNaseI Hypersensitive site (DHS)
Promoters
Enhancers
~100,000 – 250,000 DHSs per cell type (0.5-1.5% of genome)
genome.ucsc.edu Courtesy John Stamatoyannopoulos www.epigenomebrowser.org
Classifying the Genome
• Can combine histone marks, DNAse and CTCF
to classify function of regions of genome
• Hidden Markov Model commonly used
From Bernstein et al, Nature, 2012
Implications of ENCODE
• Variation in up to 10% of the human genome
may have some phenotypic consequence
• Many apparently functional sites in human
DNA are not conserved across mammals
• Regulatory sites often regulate target genes
other than the nearest genes
• Regulatory sites occur as often downstream as
upstream of the target transcription start site
Where Next?
RoadMap Epigenomics
genome.ucsc.edu
www.roadmapepigenomics.org
Download