ENCODE 2012 • The Human Genome project sequenced “the human genome” • “the human genome” that we have labeled as such doesn’t actually exist • What we call the human genome sequence is really just a reference • Furthermore, the current reference genome sequence is haploid Whose genome did Celera sequence? Supposedly: African-American Asian-Chinese Hispanic-Mexican Caucasian Caucasian Actually: Celera’s genome is Craig Venter’s Science v. 291, pp 1304-1351 • Every time an individual cell divides, new mutations arise; no two cells even within any individual have the identical sequence. ENCODE • The Encyclopedia of DNA Elements (ENCODE) is a public research consortium initiated by the US National Human Genome Research Institute (NHGRI) in September 2003. • The goal is to find all functional elements in the human genome. • All data generated in the course of the project will be released “rapidly” into public databases. • Pilot phase – 2003-2007 – method evaluation – 1% of genome • Production phase 2007-2012 – – – – September 2012 – 30 papers published 442 scientists 31 labs 147 different types of cells with 24 types of experiments – 1,642 experiments – Data released • Identification and quantification of RNA species in cells and subcellular compartments • Mapping of noncoding and protein-coding genes • Delineation of chromatin and DNA accessibility • Mapping of histone modifications and transcription factor-binding sites • Measurement of DNA methylation Credits: Darryl Leja (NHGRI), Ian Dunham (EBI) What did they find? • Controversy! • Assigned biochemical functions to over 80% of the genome. • • • • Junk DNA or no? What is a biochemical function? “a reproducible biochemical signature” “millions of switches” • The vast majority (80.4%) of the human genome participates in at least one biochemical RNA- and/or chromatin-associated event in at least one cell type. • Primate-specific elements as well as elements without detectable mammalian constraint show, in aggregate, evidence of negative selection; thus, some of them are expected to be functional. • Classifying the genome into seven chromatin states indicates an initial set of 399,124 regions with enhancer-like features and 70,292 regions with promoter-like features, as well as hundreds of thousands of quiescent regions. • It is possible to correlate quantitatively RNA sequence production and processing with both chromatin marks and transcription factor binding at promoters, indicating that promoter functionality can explain most of the variation in RNA expression. • Many non-coding variants in individual genome sequences lie in ENCODE-annotated functional regions; this number is at least as large as those that lie in protein-coding genes. • Single nucleotide polymorphisms (SNPs) associated with disease by GWAS are enriched within non-coding functional elements, with a majority residing in or near ENCODE-defined regions that are outside of protein-coding genes. In many cases, the disease phenotypes can be associated with a specific cell type or transcription factor. Changing how we view a gene? • Genes should be defined by transcripts. • Transcripts are the basic unit that’s affected by mutation and selection. • A “gene” then becomes a collection of transcripts, united by some common factor. • Another related challenge is understanding the genome’s threedimensional shape. Far from being arranged in a line, chromosomes are folded in fantastically complicated fractal patterns, and these topographies appear to shape network interaction. • “Every gene is surrounded by an ocean of regulatory elements. They’re everywhere. There are only 25,000 genes, and probably more than 1 million regulatory elements,” said Job Dekker, a molecular biophysicist at the University of Massachusetts Medical School who worked on ENCODE’s structural descriptions of the genome. • He continued, “It’s not just one gene touching one regulator. It can touch and interact with a whole collection of them. It must involve a very complicated three-dimensional structure. At this scale, chromosomes topography turns out to be incredibly dynamic, complex and cell type-specific.” • http://selab.janelia.org/people/eddys/blog/?p =683 • http://arstechnica.com/staff/2012/09/mostof-what-you-read-was-wrong-how-pressreleases-rewrote-scientific-history/ • http://blogs.discovermagazine.com/notrocket science/2012/09/05/encode-the-rough-guideto-the-human-genome/#ENCODEgene • http://www.nature.com/news/encode-thehuman-encyclopaedia-1.11312 • http://www.nature.com/nature/journal/v489/ n7414/full/nature11247.html