An Introduction to ENCODE Mark Reimers, VIPBG (borrowing heavily from John Stamatoyannopoulos and the ENCODE papers) Outline • • • • • What is ENCODE? The ENCODE phase II papers Identifying regulatory DNA Of mice and men Implications for understanding GWAS What is ENCODE? The Encyclopedia of DNA Elements (ENCODE) project aims to delineate all functional elements encoded in the human genome Transcribed ‘genes’ and regulatory regions The ENCODE Phase II Papers • Five papers in Nature, one in Cell, two in Science, 18 in Genome Research, six in Genome Biology • 440 authors from 32 labs • About half are directly relevant to molecular genetics • http://www.nature.com/encode/#/threads The Data • 1,640 data sets on 147 cell lines or tissues • Almost all high-throughput sequencing • RNA, DNA methylation, binding of common transcription factors, histone marks (H3K4me1, H3K4me3, H3K27ac, H3K27me3, ..) • Each raw data set ~ 5GB (uncompressed) • http://genome.ucsc.edu/ENCODE/ What the ENCODE Data Look Like • Multiple tracks for various epigenetic mark assays • Track values count reads from DNA fragments produced by specific assays aligned to genome (H19 or MM9) Genes, regulatory DNA, and epigenetic features Graphic from NIH RoadMap Epigenomics Site Genes, regulatory DNA, and epigenetic features - promoters - enhancers - silencers - insulators - etc. Open Chromatin at Regulatory Sites • Almost all DNA-binding proteins bind to unwrapped DNA • Either they must force open the DNA and displace histones or wait for another TF to do • Open chromatin is not an epigenetic ‘mark’ but is a useful indicator of functional DNA • Open chromatin can be assayed by DNase I sensitivity – where does DNase I cut? Genes, regulatory DNA, and epigenetic features DNaseI - promoters - enhancers - silencers - insulators - etc. DNaseI hypersensitive sites mark regulatory DNA DNaseI Hypersensitive site (DHS) Promoters Enhancers ~100,000 – 250,000 DHSs per cell type (0.5-1.5% of genome) genome.ucsc.edu Courtesy John Stamatoyannopoulos www.epigenomebrowser.org Classifying the Genome • Can combine histone marks, DNAse and CTCF to classify function of regions of genome • Hidden Markov Model commonly used From Bernstein et al, Nature, 2012 Implications of ENCODE • Variation in up to 10% of the human genome may have some phenotypic consequence • Many apparently functional sites in human DNA are not conserved across mammals • Regulatory sites often regulate target genes other than the nearest genes • Regulatory sites occur as often downstream as upstream of the target transcription start site Where Next? RoadMap Epigenomics genome.ucsc.edu www.roadmapepigenomics.org